The Second-Order Coding Rate of the MIMO 
Rayleigh Block-Fading Channel 

Jakob Hoydis, Romain Couillet, and Pablo Piantanida 



Abstract 

We study the second-order coding rate of the multiple-input multiple-output (MIMO) Rayleigh 
block-fading channel via statistical bounds from information spectrum methods and Gaussian tools from 
random matrix theory. Based on an asymptotic analysis of the information density which considers the 
simultaneous growth of the block length n and the number of transmit and receive antennas K and N, 
respectively, we derive closed-form upper and lower bounds on the optimal average error probability 
when the code rate is within 0(1/ VnK) of the asymptotic capacity. A Gaussian approximation is 
then used to establish an upper bound on the error probability for arbitrary code rates which is shown 
by simulations to be accurate for small N, K, and n. A comparison to practical low-density parity- 
check (LDPC) codes reveals a striking similarity between the empirical and theoretical slopes of the 
error-probability curve, seen as functions of n or the signal-to-noise ratio (SNR). This allows one to 
predict in practice by how much n or the SNR must be increased to realize a desired error probability 
improvement. 
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I. Introduction 

Channel capacity is the maximal rate at which communication with vanishing error probability 
is possible, provided that the length of each codeword is allowed to grow without limit. Owing to 
the celebrated Shannon theory [fl], this asymptotic regime of communication is well understood. 
Since more than a decade, it is also known how practical codes which achieve the capacity 
(for certain channels and arbitrarily long codewords) can be constructed [|2). Nevertheless, for 
real-world applications, the codeword (or block) length is naturally limited due to delay and 
complexity constraints. Thus, it is unfortunate that much less is known about the performance 
limits of communication in the finite block-length regime, where only bounds on the optimal 
error probability for a given code rate and block length are available, e.g., 0, flU. In addition, 
these bounds are in general difficult to analyze and to evaluate. As a consequence, code design 
for short block length is rendered difficult because no simple comparison to the theoretical 
optimum can be made. This is in particular the case for non-ergodic channels (e.g., quasi-static 
or block-fading channels) for which the error probability is fundamentally limited by the outage 
probability A3, and even more for multiple-input multiple-output (MIMO) channels for which 
the available bounds are even less tractable. The aim of this paper is to provide closed-form 
bounds on the optimal average error probability for the MIMO Rayleigh block-fading channel, 
assuming that coding is done over n channel uses during which the channel takes a random but 
constant realization. These bounds take either the form of asymptotically exact expressions in the 
regime where all system dimensions grow infinitely large or the form of accurate approximations 
in the finite-dimensional regime. 

A. Related Work 

One of the fundamental quantities of interest when exploring the tradeoff between code rate, 
error probability, and block length is the mutual information density [0 (or the information 
spectrum). This quantity was used by Feinstein [3] and Shannon [7] who were among the first to 
develop bounds on the optimal error probability in the finite block-length regime. Bounds on the 
limit of the scaled logarithm of the error probability, known as the exponential rate of decrease, 
were derived in 01. A simpler formula for the latter was then provided by Gallager [8] which is 
still difficult to evaluate for practical channel models. In 10, an explicit expression of Gallager's 
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error exponent was found for the block-fading MIMO channel. However, the computation of 
this result remains quite involved. 

Since the aforementioned bounds are in general not amenable to simple evaluation, asymptotic 
considerations were made, in particular by Strassen iflQl who derived a general expression 
for the discrete memoryless channel with unconstrained inputs in the regime where the code 
rate is within 0(l/y/n) of the capacity. In his work, the variance of the mutual information 
density appears as a fundamental quantity in a Gaussian approximation of the error probability. 
Nevertheless, Strassen's approach could not be generalized to channels with input constraints, 
such as the additive white Gaussian noise (AWGN) channel. Hayashi 0TJ focused further on this 
so-called second-order coding rate and provided an exact characterization of the optimal error 
probability for different channel models and input constraints. Further considerations were made 
by Polyanskiy-Poor-Verdu in [Q2] which provides, in particular, new upper and lower bounds 
on the maximal rate achievable for a given error probability and block length, investigated on 
several memoryless channels. Along the same lines, the scalar AWGN block-fading channel was 
addressed in the coherent and non-coherent settings in lfl3l and [fl~4|| - respectively. Additional 
work on the asymptotic block-length regime via information spectrum methods comprises the 
general capacity formula by Verdu-Han [15] using a lower bound on the error probability from 
lfT6l . ifTTl in the converse proof. A very comprehensive literature survey on related aspects can 
also be found in |[T2]| . 

Apart from the aforementioned information theoretic considerations, there is a significant body 
of literature dealing with the design of good codes for short block length and their (semi-) analytic 
performance evaluation. In a series of papers lfT8l . [fl9l . Il20l . the authors develop scaling laws 
for the block- and bit-error probability of iteratively decoded graph-codes. Their idea is based 
on the observation that the code performance undergoes a phase transition from the error floor 
region (at very low signal-to-noise ratios (SNRs)) to the waterfall region (after the threshold of 
the ensemble). Thus, following a conjecture from statistical physics, it is possible to express the 
error performance around the threshold by a well defined scaling law. This law depends on a 
few parameters which can be found by density and covariance evolution [|2T|. EOll . Lastly, the 
code design for non-ergodic block-fading channels was addressed in ll22ll . 
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B. Contribution and outline 

In this paper, we investigate the finite block-length regime of the MIMO Rayleigh block-fading 
channel. The focus is on the asymptotic behavior of the error probability when the coding rate 
is a small perturbation of the ergodic capacity, therefore following the works of ifTTI on the 
second-order coding rate (see also [|T2l Section IV]). However, block-fading channels, be they 
single-input single-output (SISO) or MIMO, are inherently non-ergodic so that the second-order 
coding rate as introduced in [FTP is ill-defined. To circumvent this issue, iTOl assumes coding 
over a large number of independent realizations of increasingly large block-fading channels, 
which makes the overall channel ergodic (the article is actually restricted to SISO channels but 
would easily adapt to the MIMO case). In the present article, we take the approach of inducing 
ergodicity by growing the channel matrix dimensions. Indeed, assuming an N x K channel 
matrix with independent standard Gaussian entries, letting K, N — > oo, the channel becomes 
ergodic in the limit (even for a single channel use). This ensures that communications at rates 
arbitrarily close to the ergodic capacity are possible in this regime and it becomes natural to 
investigate the optimal average error probability for the second-order coding rate when K, N, 
and the block length n grow simultaneously, i.e., the asymptotically achievable error probability 
for rates within 0{l/ynK) of the ergodic capacity. 

Our approach to characterize the optimal average error prob ability for the second-order coding 
rate closely follows the information spectrum methodology of ifTTTl . We specifically derive lower 
and upper bounds for the error probability, starting from adaptations of Feinstein's lemma for the 
upper bound (Lemma [Din Section HIT]) and Verdu-Han's lemma for the lower bound (Lemma |3] 
in Appendix |A]), respectively. The method naturally leads to considering second-order statistics 
of the information density, seen as a real functional of three large-dimensional random matrices 
(the N x K channel, the K x n input, and the N x n noise matrices). Such statistics are 
very demanding to obtain and call for tools from random matrix theory. We rely here on the 
Gaussian tools developed by Pastur ll23l which consist of an integration by parts formula to 
evaluate expectations of functionals of Gaussian matrices and the Poincare-Nash inequality to 
control the variance of such functionals. Similar to [fTD . a specific technical difficulty of this 
analysis arises because the information density (when properly centered and scaled) may or may 
not have a Gaussian weak limit, depending on the input distribution. While this is not a major 
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problem for the characterization of the upper bound, it demands extreme care when dealing 
with the lower bound. The results we obtain are then used to derive an approximation of the 
optimal average error probability for finite block lengths. Those are compared against the error 
probability achieved by practical low-density parity-check (LDPC) codes. 
Our main contributions are summarized below: 

1) We prove that the optimal average error probability F e (r\j3,c) for second order coding rate 
r < and for j3 = \imn/K, c = lim N/K, can be bounded as 



where $ is the Gaussian distribution function, and 6 + > 6- are functions of f3, c, and 
the SNR which are given in closed form. As opposed to [fTTl . [fT2l . it is not possible to 
obtain matching lower and upper bounds due to the presence of the random channel matrix. 
Nonetheless, it appears that the distance between these bounds is very small for practical 
settings. In particular, the bounds are asymptotically tight in the low-SNR regime. 

2) We derive an approximation of Feinstein's upper bound of the error probability P e (i?) 
for code rate R and inputs satisfying an average energy constraint. We conjecture with 
supporting arguments that the approximation error is of order 0(l/n). 

3) We compare the aforementioned finite-n upper-bound approximation against a practical 
quadrature phase shift-keying (QPSK)-based LDPC code on a 16 x 8 channel with various 
values for n and observe that, despite a 4 dB SNR-gap which is normal for such codes 
and MIMO detection, the slopes of the error probability as a function of n or the SNR 
matches our theoretical bounds. We further highlight a striking resemblance between our 
bound and the canonical scaling law in ll20ll . 

The results of this work are therefore useful to determine by only a few system parameters the 
required code rate to ensure a certain error probability. Apart from its theoretical and practical 
contributions, we believe that this article introduces a new approach which takes advantage 
of both information spectrum and random matrix methods, and allows one to study involved 
statistics of MIMO communication systems beyond the classical ergodic and outage capacities. 




(1) 
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Notation and definitions 

The set of nonnegative integers is denoted by IN and IN* = IN\ {0}. The real, nonnegative real, 
nonpositive real, and complex fields are denoted by R, R + , R~, and C, respectively. Boldface 
letters x and upper-case letters X are used to denote vectors and matrices, respectively. The 
transpose, complex conjugate, and complex conjugate (Hermitian) transpose are denoted by 
(■) T , (•)*, and (-) H , respectively. The trace and determinants of a square matrix X are written 
trX and det(X), respectively. The spectral norm of a square matrix X, i.e., the absolute largest 
eigenvalue, is denoted by ||X||. X^ or [X]^ denotes the (i, j)-element of X. Random vectors 
and matrix variables are denoted by lowercase letters x and uppercase letters X, respectively. 
The symbol Pr[-] denotes the probability of the bracketed random argument. For a set S, we 
define by V(S) the set of probability measures with support in S, i.e., P G V(S) is equivalent 
to P(<S) = 1. We also denote by supp(P) the support of P. 

For random matrices X, Y in <E Kxn and <£ Nxn , for some integers N, K, n, let F x G V(€ Kxn ) 
and let X h-> F Y \x( ' |X) be any Borel measurable mapping. We define the probability measure 
F XY by 

F XY (A x B) = I F Ylx (B\X)F x (dX) (2) 

J A 

where A, B are Borel sets of <E Kxn and <C Nxr \ respectively. Similarly, we define the distribution 

P y as 

F Y (B) = J F Y]x (B\X)F x (dX) (3) 

for any Borel subset B C <C Nxr \ where the integral is understood to be taken over the whole 
space C Xxn . We also define, for a P^-measurable functional /, 

E[f(X)\ = J f(X)F x (dX) (4) 

and the variance of f(X) as 

Var[/(X)]=E[|/(X)-E[/(X)]| 2 ]. (5) 

Let P and Q be two measures on (the Borel cr-field of) <C Kxn . Then P is said to be absolutely 
continuous with respect to Q (or dominated by Q) if F(A) = for every Borel set A for which 
Q(A) = 0. This is written as P Q. For such measures P and Q, we denote 

dF _ F(dX) 

rfQ (X) " Q(dX) (6) 
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the Radon-Nykodym derivative [24, Theorem 32.2] of P with respect to Q at position X, i.e., 
for any Borel set A, 

P (A) =/*<«}=/ S^W). (7) 
' ' J A dQ V J A Q(dX) Vy ' 

The notation P(cfX) < Q(dX) will then be understood as 

dF _ P(dX) 

dQ (X) " Q(dX) " L (8) 
If P is not absolutely continuous with respect to Q, we set dF/dQ — oo and P(dX) < Q(dX) 
is understood as an always false statement. 

The weak convergence of the sequence of probability measures {/-tn}J£Li to /i is denoted by 
fi n fi. 

We denote CjV(0, a 2 ) the complex circularly symmetric normal distribution with zero mean 
and variance a 2 . We call $ the distribution function of the real standard normal distribution, 
given by 

$0) = -= / exp (-It 2 ) dt. (9) 



_'2n J -oo V 2 

The notation /„(•) = O (#„(•)) as n — > oo denotes that there exists a constant C, independent 
of n and the arguments of #„(•), such that |/ n (-)| < C \g n {-) \ for all n. Similarly, /„(•) = o (g n (-)) 
as n — > oo denotes that, for every £ > 0, there exists a positive constant n £ , such that ]/«(•) I < 
e |# n (-)| for all n > n e . 

II. Channel model and problem statement 
Consider the following MIMO memoryless Gaussian fading channel: 

y t = 4 H " X < + ™t, t = {1, . . . , n} (10) 
v K 



where y t G is the channel output at time t, H n G <C NxK is a realization of the random 
channel matrix H n G C NxK whose entries are independent and identically distributed (i.i.d.) 
CA^(0,1) and the index n reminds that H" is constant for the duration of n channel uses, 
x 4 G <C Kxl is the realization of the random channel input x t G <C Kxl at time t, and aw t is 
the realization of the random noise vector aw t at time t whose entries are i.i.d. CM (0, a 2 ). 
The transmitter end has only statistical knowledge about H n while the receiver end knows H n 
perfectly. In particular, we will assume H n , x t , and w t to be independent for each t. 
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We define the following matrices: X n = (x 1; . . . , x n ) £ C Kxn , W n = (w 1; ...w n )e <C Nxn , 
and Y n = (y 1; . . . , y n ) 6 <rjNxn_ Associated to these matrices, we define the random matrices 
X n = (x u ...,x n )e C Kxn , W n = ( Wl , ...,w n )e C Nxn , and Y n = . . . , y n ) G <C Nxn . 
For T > 0, we denote 

1 



trX"(X") H <r^ (11) 



nK 

i.e., the set of inputs X" with energy constraint T. 

The mutual information density of Pyn|x™,H™, the probability measure of Y n conditioned 
on X n and H n , is defined by (see e.g. [0 for the AWGN definition) 

T {n) a 1 1 Py"|X",H"(^y w |^ w , - 

where, for given X n , H n , the ratio Pyn| X n jH n(-|X n , H n )/Pyn| H n(-|H n ) denotes the Radon- 
Nykodym derivative of the measure ^Y n \x n ,H n 

(■|X n , H n ) with respect to Pyn|#n(-|H n ) whenever 
Py-|x«,H"(-|X n ,H n ) < Pyn| // n(-|H n ) and is set to oo otherwise. 

Of particular importance is the case of independent Gaussian inputs, i.e., x t ~ CM (0,TI K ), 
for which the mutual information density takes the form 



where 



Ink ~ Cn,k + Rnk (13) 



C N , K = logdet n N + —H n (H n ) H ) (14) 



(n) a 1 

Kn > k ~ nK tT 



r 



K H n {H n ) H + a 2 I N ) Y n (Y n ) H - W n (W n ^ 



(15) 



However, note that such Gaussian inputs do not all satisfy the energy constraint in (fTTT) . 

Definition 1 (Code and average error probability): A {Pe l \ M n , T) -code C n for the channel 
model (flOl) with power constraint (fTTT ) consists of the following mappings: 
• An encoder mapping: 

V :M n ^€ Kxn . (16) 

The transmitted symbols are X^ = Lp[m) e <Sp for every message m uniformly distributed 
over the set M n = {!,•••, M n } of messages. 
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• A set of decoder mappings {4>u n }u n e€ NxK with: 

0h« : & Nxn — > M n U {e} (17) 

which produces the decoder's decision rh = 0h™(Y™ ), = -^~H. n Lp(m) + crW n , on the 

transmitted message m, or the error event e. 
For a code C n with block length n, codebook size M n , encoder ip, and decoder {0H™}H™ec iVxif > 
the average error probability is defined as 

= pW (Cn) = — Pr [tfi ^ m|m] (18) 

n m=l 

where the probability is taken over the random variables W n and H n . 

Let supp(C n ) denote the codebook {<p{l), • • • , ip(M n )}. The optimal average error probability 
for the rate R with energy constraint T is defined as 

¥<^\R)= inf |P e (n) (C„) J- log M n >i?). (19) 

The exact characterization of Pe (R) for fixed n, K, and is generally intractable. As 
mentioned in the introduction, a classical approach consists in considering rates within 0{\j \Jn) 
of the ergodic capacity with block lengths n growing to infinity (i.e., second-order coding 
rates). This leads to tractable limiting error probabilities, referred to as optimal average error 
probabilities for the second-order coding rates HU, [fT2l|. However, as the ergodic capacity is 
ill-defined for block-fading channels, we assume here that the system dimensions K and iV grow 
large. This induces ergodicity in the channel and entails a new definition of the second-order 
coding rate and the optimal average error probability for the block-fading MIMO channel. 

Precisely, since K and N cannot be assumed to grow faster than n for practical reasons, we 
assume that all three parameters are large but of the same order of magnitude. This is expressed 
mathematically via the relations 

(3 + o(n~ 2 ) (20) 



n 
K 

^ = c + o(n- 2 ) (21) 
for some constants /3,c>0asn— i-oolU These relations will be denoted by n ^ ,c \ oo in the 



remainder of the article. Under this limiting regime, the per-antenna capacity of the channel 

'it is easy to see that these constraints impose c and fi to be rational numbers and the sequences {N/K}'^L 1 and {n/K}^L 1 
to be constant for all large n. 
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converges for almost every channel realization to an asymptotic limit C [|25l . We can then 
characterize the error probability in the second-order coding rate, i.e., when the coding rate 



is within 0(1/ ynK) of the limiting capacity C. For technical reasons, it is essential that the 
limiting ratios and c are reached sufficiently fast to avoid asymptotic fluctuations of the ratios 
n/K and N/K in the central limit theorems (CLTs) derived hereafter. In practice, one will 
simply take N/K — c, n/K = j3, and assume n, K, N large enough. 

Under these assumptions, Pe (R) is estimated via the following limiting error probability: 

Definition 2: The optimal average error probability for the second-order coding rate r with 
input energy constraint Y is defined as 

F e (r\8,c,T) = inf \ limsup Pj n) (C n ) liminf VnK ( — logM n -C) >r 

{C„ :s upp(c„)c5«}- =1 (M (0,c) \nK J 



n- — ^-oo n 



(22) 



where 



C = liminf E\C NK ] (23) 

(/3,c). 
n too 

with Cn,k as given in (fl4l) . 

The main objective of this article is to characterize F e (r\(3, c, T). Without loss of generality 
and for simplicity of exposition, we take T = 1 from now on and denote S n = <S>" and 

F e (r\P,c) = P e (r|/3,c,l). 

III. Main results 
A. Bounds on the optimal average error probability 

Theorem 1 (Bounds on the optimal average error probability): For x > and c > 0, define 

So(x) 4 £zl _ I + V(i -* + *)' + 4c* 

uv ; 2x 2 2x 



with derivative 



r// \ (5 (x) (1 + 5 (x)) 

OJx) = — r-r-rr (25) 

uv 7 1 -c + x + 2x<5o(x) 
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and denote, for a 2 > defined in (flOl) , 

1 S (a 2 



0_ 



9, 



-Phg 1 



-/31oj 



c(l + 5 (<r»))' 



M^ 2 



+ (c + A(^)) 



+ 2^-^ (a 2 )) 



Then, for the channel model (|T0T > with unit input energy constraint, as n 
(i) The per- antenna ergodic capacity E[Cjv,at] satisfies 

K[C N , K ] = C + 0(n- 2 ) 

with 

C= log(l + 5 (a 2 ))+clogfl + 



(26) 
(27) 



08,c) 



> oo, 



So (a 2 



(28) 



(29) 



a 2 (l + 5 (<x 2 )); l + 5 (^ 2 )' 
(ii) The optimal average error probability P e (r|/3,c) for the second-order coding rate satisfies 

- If r < 0, 



r 



< F e (r\(3,c) < $ 



6, 



- If r > 0, 



i<P e ( r |/3 )C )<$^ 



(30) 



(3D 



Proof: The details of this proof are provided in Section IIV-BI ■ 
Theorem \T\ shows that, for sufficiently large channel dimensions and block length, the optimal 
error probability for a code rate close to the ergodic capacity, i.e., (nK)^ 1 logM n = C + 
(nK)~ l l 2 r, is contained within two explicit bounds which depend only on c, j3, and a 2 . This 
is to be compared with the AWGN scenario of flTTl . Ifl2l where the corresponding bounds were 
found to depend only on a 2 . However, as opposed to Theorem [Q the lower and upper bounds in 
these papers were shown to be equal. We discuss in Remark |3] below the technical reasons for 
this important difference. Note that, for rates above the ergodic capacity limit (i.e., for r > 0), 
the lower bound is very pessimistic and can be far from its associated upper bound. In contrast, 
the more interesting case r < 0, corresponding to code rates below the ergodic capacity, features 
two bounds which will be seen by simulations to be in general very close to one-another (but 
will be proved to be always distinct). 
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Remark 1 (On the quantity S (a 2 )): The function c~ 1 5 (a 2 ) coincides with the Stieltjes trans- 
form m^ c (z) of the Marcenko-Pastur measure [i c with parameter c ll26ll evaluated at position 
z = —a 2 , which is defined by m^ c (z) = J (t — z)^ 1 fx c (dt) for all z 6 C\supp (yU c ). This measure 
is the limiting distribution of the eigenvalues of K~ 1 H n (H n ) H as N, K — > oo and N/K — > c. 
For this reason, the quantities C, 6_ , and 6 + of Theorem \T\ naturally appear as functionals of 
/i c (or equivalently 5 (a 2 )). Many interesting properties can be deduced from Theorem [1] based 
on this observation. 

Remark 2 (Fluctuation around ergodic capacity): For the channel model (flOT ). the optimal 
average error probability may be alternatively written as 

p e (r 1/3, c ) = inf l lim sup P e (n) (C n ) lim inf V^K ( -\- log M n - E[C N K ] J > r 



supp(C„)C.S' i 



(32) 



since 



V^K (E[C NiK ] - C) (33) 

as n ^' c \ oo by Theorem Q] (i). In the finite N, K, ra-regime, we may therefore see the optimal 
average error probability as an approximation of the optimal achievable error under the rate 
constraint 

4- log M n > E[C N>K ] + (34) 
nK y/ n K 

Note that the relation (|33l) is fundamentally dependent on the Gaussianity of H n . Indeed, 
Theorem \T\ (i) is a much stronger result than the well-known convergence of the per-antenna 
mutual information to its asymptotic limit (see, e.g., [25 j) which holds for channels composed of 
arbitrary i.i.d. entries with finite second-order moment. It was precisely shown in [1271 Theorem 
4.4] that, whenever the entries of H n have a non-zero fourth order cumulant k = E [Jif^l 4 ] —2, a 



bias term B proportional to k arises such that (1331) must be modified to sfnK (E\Cn.k\ — C) — >■ B 
as n ^' c \ oo. In this case the equivalence of (|32|) and (1221) does not hold. For Gaussian channels 
(since k = and then B = 0), however, the asymptotic mutual information is reached at the 
sufficiently fast rate of 0(n~ 2 ) (as confirmed by Theorem |6] in Appendix IDT). 
Remark 3 (Tightness of the bounds): For every c, 0, o 2 > 0, 



•* >ntj.'<0 (35) 
$ ( ) > h , r > 0. 



2 
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Apart for r = 0, the lower and upper bounds on the optimal average error probability are 
therefore never equal. This is in sharp contrast to flTTI . [H2l where, for SISO AWGN channels, 
the bounds are proved to be equal. The reason for this discrepancy lies in the presence of the 
random channel H n which naturally induces a dependence of the second order statistics of I^' K 
on the "fourth order moment" E[K~ l tr (n~ l X n (X n ) H ) 2 } of P X n. The weak lower bound 1/2 
for r > is in particular a consequence of the impossibility to bound the fourth order moment 
of Px™ from above under the sole constraint (fTTT) . see Section llV-BI In 0TJ, |fT2|. only (scalar) 

(n) 

second order moments of play a role in the second order statistics of I NK . These are easily 
controlled by (fTT]) . 

Proof: The second inequality is trivial. We therefore focus on the case r < 0. Denoting /i c 
the Marcenko-Pastur measure with parameter c, we have from Remark CD 



el - 9 2 _ = c (1 - 2c- 1 a 2 5 (a 2 ) 



c 

c(l 

c 



c-^Xia 2 )) 



t + a 2 
2a 2 t + a A 



(t + a 2 ) 2 



H c (dt) + 
(i c {dt) 



a 



(t + a 



2\2 



(J> c (dt) 



;fi c {dt) > 



(36) 
(37) 

(38) 

(39) 



(t + a 2 ) 2 

which ensures that Q\ > 2 __ for all a 2 and all c > 0. The result follows by noticing that $ is 
increasing on Rr. ■ 
Remark 4 (High SNR-regime): In the high SNR-regime, we have the following result: 

-/31o, 



lim e 2 _ 

<T 2 ^0 



c) + c 



oo 



c < 1 
c = 1 



(40) 



lim el 

<7 2 ^0 + 



-/31og(l-i) + l , Ol, 

-/31og(l - c) + 2c , c< 1 
oo , c = 1 

-/31og(l-i)+2 , c>l. 

Proof: Using the definition of 5q(x) in Theorem [T] it is easy to show that 



(41) 



lim x5 (x) 







c < 1 
c> 1. 



(42) 
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Using the relation (1 + 5 (x))~ 1 = 1 — c + x5 (x) (see Property Q] (v) in Appendix |Dl) and taking 
the limit x — > leads to 

I I s - , c< 1 

lim<Wx) = < (43) 
I oo , c > 1 . 

Using the last result, it follows from the definition of S' (x) that 

,0 , c < 1 
lima:X(x) = <{ (44) 

1-c , c> 1. 



Replacing (|42|)-(|441) in the definitions of 2 _ and Q\ while taking the limit a 2 — > concludes 
the proof. ■ 
Remark 5 (Low SNR-regime): Both 0+ and 6 2 _ converge to as a 2 — > oo. Thus, for r < 0, 
the upper and lower bounds on F e (r\/3,c) are equal to zero and, for r > 0, the upper bound 
jumps to 1. However, also the capacity C is zero. This being said, since coding at high rates 
over large dimensional channels can be achieved at very low SNR, first order approximations 
of C and 6 2 _ , Q\ for a 2 — >■ oo are meaningful. These are given by 

c 



a 



C=- + 0(a- 4 ) (45) 



0\ = - 9 + 0{a' A ) (46) 



a 2 
2c 



6i = — + 0(a- 4 ). (47) 
cH 

This shows in particular that [6 2 + — 6^)/#i = O(o~ 2 ), implying the asymptotic closeness of the 
upper and lower bounds in the low SNR regime. 

Figure \T\ depicts the bounds on the optimal average error probability for varying second-order 
coding rates r and for different SNR values (defined as SNR = a~ 2 ), including also the extreme 
high- and low-SNR cases. We choose c = 2 and (5 = 16. For negative second-order coding rates, 
the gap between the upper- and lower-bound is rather small and decreases with either growing 
r or decreasing SNR. 

Remark 6 (Second-order outage probability): Instead of the optimal average error probability, 
we may consider the second-order outage probability P out (r|/3, c) for the rate r, which we define 
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-4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0.5 1 1.5 2 2.5 3 3.5 4 



Second-order coding rate r 



Fig. 1. Bounds on the optimal average error probability as a function of the second-order coding rate r for different SNRs 
and the parameters c = 2 and /3 = 16. 



as 



Pout(r|/3,c) 



inf 

{Cn:supp(C„)C5"}^ =1 



limsupP e (n) (C n ) 

^ n roo 



liminf K 

(/».<=). 

?i roo 



logM n -C >r V. 



(48) 



Note that the second-order outage probability and the optimal average error probability are 
related by P out (r|/3,c) = F e (r^/fi\(3,c). This definition allows us to study the behavior of the 
second-order outage probability for growing j3. In the finite dimensional setting, this corresponds 
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to increasing the block length while maintaining N and K (and thus the capacity KG) fixed. 



This cannot be performed on P e (r|/3,c) since, by growing n, ynKC grows as well. From the 
above definition, we have 



min < $ 



where 



e 



out L 







out A 



log 



log 



L _ 



1-1- 



2^<Pout(r|^c)<$ 



0°, 



S (a 



2\ 2 



! 1 



5n (a 



2\ 2 



2 / 



a 



(49) 



(50) 



(51) 



c(i + M<x 2 )y 

Interestingly, for r < 0, as (5 — > oo, we recover the limiting outage probability of MIMO 
Gaussian fading channels 11281 . 11271 . 



lim P out | 

/3— >oo 



^ C ) = $ (^ul) 



with 







out ^ 



2\2 



-log 1-- 



(52) 



(53) 



c(l + 5 (^ 2 )) 2 , 

Although both results coincide, there is a fundamental difference in the way they are obtained. 
In ||28l , 12711 . the block length is assumed to be infinitely large from the start and then the limit 
is taken in iV and K. In contrast, we have obtained (l52l) by changing the order of both limits. 
Note also that, while $ (r/0 out ) and $ (r/^ 1 ") are decreasing functions of for r < 0, $ (r/#° ut ) 
is increasing in f3 for r > 0. Although no tight lower bound was derived for r > 0, this strongly 
suggests the existence of a crossing point for the optimal average error probability for an error 
rate of 1/2. This in turn suggests that good codes with short block lengths perform better than 
long codes when the coding rate exceeds the channel capacity (e.g., at very low SNR). Indeed, 
if not, i.e., if long codes perform at least as good as short codes, then the optimal average error 
probability would equal precisely 1/2 for all r > 0, which would go against intuition. We will 
see a practical example of this crossing point effect in Figure 0] in Section ITlI-Cl 

Figure [2] depicts the bounds on P out (r|/3,c) in (l49l) as a function of (3 for different values of 
c, assuming SNR = 10 dB and r = — 1 fixed. For each value of c we also provide the limiting 
outage probability as given in (1521) . The upper and lower bounds are seen to approach the outage 
probability at a rate as (3 grows, which is easily confirmed by direct calculus. 
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0.16 




Fig. 2. Bounds on the second-order outage probability as a function of /? for different values of c, r = — 1, and SNR = 10 dB. 
The limiting outage probability is P out = P out (r|oo, c). 



5. Finite dimensional approximation 

In this section, we apply Theorem \T\ to obtain upper-bound approximations of the optimal 
average error probability for arbitrary coding rates R in the finite dimensional regime. For 
practical purposes, we assume transmissions with an average energy constraint rather than a 
peak energy constraint. To this end, we define (P e , M n , 1) -codes satisfying a unit average 
energy constraint: 



S n = 



c 



E 



nK 



irX n (X 



< 1 



(54) 
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We then define the optimal average error probability Pe (R) for rate R under unit average 
energy constraint as 



inf \Pi n \C n ) 

C n dS n 



1 



nK 



losM n > R 



(55) 



where P e (n) (C n ) is the average error probability for a given (Pe n \ M n , l)-code. 

Before we continue, we need to introduce an auxiliary lemma which is a simple generalization 
of Feinstein's lemma j3]| to arbitrary input distributions: 

Lemma 1 (Variation of Feinstein's lemma): Let n > 1 be an integer and let P X n e A n be an 
arbitrary probability measure where A 1 C V(C Kxn ). Denote by Y n the output from the channel 
corresponding to the input X n and random fading H n . Then, there exists a block 
length n codebook of size M n that, together with the maximum a posteriori (MAP) decoder, 
forms a code C n whose average error probability Pe n \C n ) satisfies 



Pi n \C n ) < inf <^ Pr 



F Y n ]x ^ Hn (dY n \X n ,H n ) 

l0 § TO (AVn\TJn\ - l0 § T 



+ 



7 



(56) 



Proof: The proof is provided in Appendix IA-BI ■ 
Since Lemma \T\ holds in particular for A n = <S n it can be used to prove the following result:^ 

Theorem 2 (Approximation of Feinstein's upperbound): Let {i? n }^L 1 be a real sequence. Then, 
there exists a real sequence {i n }^Li such that 



^(Rn) < $ 



with t n 4 as n ^ ,c \ oo, where 



(R n -C + 6* n ) +exp(-nK%] 



(57) 



51 



{c-R n + e 2 + ) 



(C - R n ) 2 + (nK)~ l e 2 + log (2-nnKei 



(58) 



{c-Rn + eiy 

Proof: The proof is provided in Section IIV-C1 ■ 

As shown in the proof, Theorem |2] fully exploits Lemma [T]in the sense that, for all finite n, the 
optimal choice for 7 (whose role is played by <5* here) in (|56l is considered. This optimal choice 
5* can be made arbitrarily close to zero in the n > 00 limit, which is why it does not appear 



2 Note that Lemma Q] is also required in the proof of Proposition Q] (ii) in Appendix IA-CI which is the cornerstone result for 
the proof of Theorem Q] in Section HV-BI 
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in Theorem \T\ Nonetheless, since we cannot obtain any precise information on the convergence 
rate of i n to zero with respect to that of 5*, the potential gains of Theorem |2] cannot be assessed 
analytically. This is in contrast with [12] where a Berry-Esseen inequality is used to obtain this 
precision. However, it is easily shown that, as R n = C + r/ynK, 6* = 0{n~ x logn). Although 
not proved in the article, we conjecture that t n = 0(n~ r ) which suggests that the role of <5* 
in the finite dimensional approximation is non-negligible. This conjecture is deduced from the 
following observations. First, in the proof of Theorem @] we show that the difference between 
the characteristic functions of the random variable \JnK(I^ K — C) and that of a Gaussian 
random variable with zero mean and variance is of order 0(n~ l ) (this is exactly given by 
(1435I )). This convergence is a necessary (but not sufficient) condition for £ n = C(n _1 ). This 

(n) 

being said, since I N K is a smooth function of Gaussian random variables, it is very unlikely 
that i n at a slower rate. An explicit proof of this fact could follow from f|29l Theorem 2.2] 
which can be applied to upper bound £ n by a function of the Euclidean norm of the gradient 
and of the spectral norm of the Hessian matrix of y/nK {l^ K — c\ when seen as a function 
of 2KN + 2Kn + 2Nn real Gaussian variables (the real entries of H n , X n , W n )^ 

Remark 7: In ||20l , it was conjectured that for certain families of iteratively decoded code en- 
sembles and for general channel models, a canonical scaling law of the optimal error probability 
Pg ( t exists, which takes the form 

Pg(p) = $ (p{p*) -C(p) + 0n-3)) + O (n-*) (59) 

where C(p) is the channel capacity for a certain channel parameter p, C(p*) is the critical point at 
which the phase transition from the error-floor region to the waterfall region of the code occurs, 
a is a scaling parameter, and f3n~i corresponds to a threshold shift due to the finite block length 
n. Note here that only n is allowed to grow as the authors do not necessarily consider a MIMO 
setting. Interestingly, there are several striking similarities between (l59l and the upper-bound 
in (|57l) . First, the scaling parameter a can be identified with the standard deviation 6 + of the 
mutual information density. In 11201 . it was shown that a corresponds to the standard deviation 
of the number of degree-one check nodes (or the number of erasures in the graph) at the critical 

3 Using comparison bounds between the spectral and Frobenius norms of the Hessian matrix, it seems not too difficult to 
prove that for all e > 0, l n — 0(n~ 1 ~ e ), but the explicit bound on the spectral norm, which we believe is C(n -1 ), is more 
involved to obtain. 
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Target rate R 

Fig. 3. Upper-bounds on Wi (R) as a function of the coding-rate R for SNR = 10 dB and different values of K, N, and n. 

point after density and covariance evolution. The term (3 plays a role similar to that of <5*, while 
the difference C(p*) — C(p) can be identified with the difference between the code rate and 
the channel capacity R n — C . Therefore, although iterative codes show a back-off from channel 
capacity which is null in our current setting, both (159} and Theorem |2] have similar limiting error 
probability behavior. 

Figure [3] provides the comparative performance of Theorem [T] and Theorem |2] as an approxi- 
mation of Feinstein's upper bound (Lemma [B for Pe (R). Precisely, the curves of Figure |3] are 
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associated to the following approximations of the upper bound on (R) 



inf 5>0 jPr 
<1> [ &£(R 



I$ K <R + 5 
-C + 61))+e 



-nKS* 



<,K8 



(Feinstein) 
(Theorem |2) 
(Theorem [T) 



(60) 



where 1^ K is the mutual information density for Gaussian inputs X n defined in (|T3T) with 
r = 1. We consider three different sets of parameters (K,N,n). As expected, the larger all 
these parameters, the smaller the gap between the bounds of Theorem |2] and Theorem [T| For 
small values of these parameters, the approximation by Theorem |2] provides a much better 
estimate of Feinstein's upper bound due to a non-negligible value of <5*. 



C. Comparison with practical codes 

We now use the results of the previous sections to assess the performance of realistic codes 
and to compare their qualitative behavior to that of our bounds. 

First, we consider a scenario with K — 8 transmit and N = 16 receive antennas employing 
QPSK modulation at each antenna. Coding and modulation are set up in a conventional bit- 
interleaved coded modulation (BICM) scheme, with a small random interleaver separating the 
code and the modulation. At the receiver, we employ a non-iterative demodulation scheme, with 
a MAP MIMO demodulator based on a full code book enumeration. We consider short LDPC 
codes and take as an example the rate 1/2 code used in the WiMAX standard [|30l . This code is 
a quasi-cyclic irregular repeat-accumulate (IRA) LDPC code where the accumulator is slightly 
modified to ease the encoding circuit. The code is constructed from a 12 x 24 large protograph 
with the permutations being S x S cyclically shifted identity matrices. This code is intended for 
relatively small adaptable block length. The block length adaptation is performed by changing 
the dimension S (also called lifting factor) of the permutation matrices and by adapting the 
shifts accordingly. The parameter S can be chosen to take 19 discrete values between 24 and 
96, allowing for code blocks ranging between n' = 576 bit and n' = 2304 bit, corresponding 
to a range of n = n'/K/2 e {36, . . . , 144} channel uses. 

Note that the performance could be slightly improved by employing iterative decoding and 
demodulation, i.e., including the MIMO demapper within the decoding loop OTI . We however 
do not take this additional loop into account as the expected performance gains are relatively 
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Fig. 4. Approximate bounds on the error probability for finite n, as a function of the SNR = 1/cr 2 , r = K(R - C) for K = 8, 
AT = 16, 7? = log(2), n G {36, 144}, C being evaluated with c = N/K, /3 = n/Jf and for different SNR values. Theoretical 
curves are compared to a rate 1/2 LDPC QPSK code (giving R — log(2)). 



small with short code (and thus interleaver) lengths under the given setup. Furthermore, the 
decoding complexity is largely increased by the multiple executions of the demodulator. 

In Figure |U we compare the error probability of the code described above for n E {36, 144} 
against the upper bound by Theorem |2] Since QPSK modulation together with a rate 1/2 code 
is used, the coding rate in nats is R = log(2). We can make several interesting observations 
from this figure. For both block lengths, the SNR-gap between the simulation results and the 
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corresponding upper bounds by Theorem |2]is roughly constant (to about 4dB) for a large range 
of SNR values. This suggests that the upper bound from Theorem [2] adequately describes the 
code behavior in terms of the error probability, indicating in particular that the observed error 
probability trend of the LDPC code is linked to the finite block length rather than the structure 
of the code itself. This is also in agreement with the general scaling law described in Remark |7J 
in Section [Tll-B L Besides, note that both theoretical and simulated curves exhibit a crossing point 
close to 1/2 error probability, which goes in line with Remark |6] in Section IlII-Al 

Figure |5] shows the bounds on the outage probability as given by Remark |6] in Section ITlI-AI and 
the error probability of the real code when the block length n varies for a fixed SNR = 3.5 dB. 
In order to facilitate the comparison, we adapt the SNR for the theoretical bounds in such a 
way that the upper bound by Theorem |2] equals the error probability of the real code for the 
smallest block length n = 36. This corresponds to subtracting the aforementioned SNR-gap. 
Interestingly, Theorem |2] perfectly mimicks the behavior of the real code, which shows that, not 
only the SNR-gap of the LDPC code to the upper-bound remains constant on a large range of 
SNR, but it also remains constant for a certain range of block length n (a similar observation 
was made in |[T2l Sec. IV-D]). This indicates again that the performance of the LDPC code for 
practical SNR and block length values is limited by physical constraints rather than by the code 
structure. This remark is likely to hold for other classes of practical codes (see also Remark |7J 
in Section ITlI-B|) . which implies that Theorem [2] can be used in practice to predict by how much 
the block length of a code should be increased to achieve a desired reliability improvement. 

IV. Proofs of the main results 

This section is dedicated to the proofs of the main two results, Theorem [Hand Theorem |2] Our 
approach relies on information spectrum methods [6] and is more precisely related to Hayashi's 
proof-techniques used in 0T]. In the course of these proofs we will require several additional 
results whose detailed proofs are deferred to the appendices. 

A. Auxiliary definitions 

In order to simplify the notation in the following analysis, we need to define several auxiliary 
functions and quantities: 
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0.016 



LDPC code 

Finite n bound, Theorem |2] 

$(r/fl° ut ) 

Pout 




Fig. 5. Approximate bounds on the outage probability for finite n, as a function of n/K, r = K(R — C) for K = 8, 
TV = 16, and SNR = -0.785 dB, where C is computed with c = N/K, f3 = n/K and R = log(2). The limiting outage 
probability is P out = P ou t(V|oo, c). Theoretical curves are compared to a rate 1/2 LDPC QPSK code (giving R = log(2)) 
assuming SNR = 3.25 dB. The SNRs were adapted so that the finite n bound of Theorem [2] matches the error probability of 
the LDPC code when n = 36. 



For integers t and positive x, we define the functions 5 t (x) and jt(x) respectively as 

C-l 1 y/(l-C + x) 2 +AcX 

6 ° {X) = ~2x~ ~ 2 + 2^ (61) 

, , x _ S t -i(x) (1 + 5 (a*)) + Et\ (Sk-i(x) - v 2 5 k (x)) 6 t - k (v*) 

6t{x) ~ l-c + ai(l + 5 (ai))+x5 (x) ' * ~ 1 (62) 

7o(x) = -^ff- (63) 
1 + 5 {x) 

lt ( x ) = S^Ax) - a 2 S t (x), t>l. (64) 
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Details about these functions are provided in Proposition [7] in Appendix They are presently 
only necessary to define the following constants: 

Co = 7i(^ 2 ) (65) 

t x ysii* 2 ) 2 + 2q 2 r^o (^ 2 ) (i + fr. {<r 2 ))H° 2 ) _ a r 

^ ' 1 S (a 2 ) (1 + 5 (^))3 P J a2 1 - c + u(l + 2io(u)) " 

(66) 

^ 2 _ ^ ^^^i^-^o^ + ^ Cor 2 )) {-aH^f + (1 + 25 (a 2 ))<5 1 ( ( x 2 ) 2 - ^(a 2 ) 2 ^ 2 )) 
si — P 



(5 (a 2 )(l + 5 (a 2 )y) 2 



(67) 



C2 = ^oW' (68) 

From ^ and £ 2 , we further construct the polynomial 

G^HC^ + CiV. (69) 

It is shown in Appendix IB-BI that these expressions are well defined for each a 2 . The exact 
characterization of Co)Ci( x )?C2 is not important. The only information required here is that 
Co > 0, (2 > 0, and (i(x) — * as x — > 0. The first two relations follow from the fact that 
c~ 1 8o(a 2 ) is the Stieltjes transform of the Marcenko-Pastur law /i c taken in —a 2 (see, e.g., Il32l 
Chapter 3.2]) so that in particular 

ll (a 2 ) = S (a 2 )-a%(a 2 ) (70) 
Iff 1 



c \J t + a 2 ^ ' J (t + a 2 ) 
1 f t 



2\2 



fi e (dt) -a 2 J — — — fi c (dt)j (71) 
H c {dt) > (72) 



cj (t + a 

^(a 2 ) = -5' (a 2 ) = -J ^^(dt) > (73) 

where the identity 5i(a 2 ) = —5' (a 2 ) follows from Property [TJ (vi) in Appendix ITJl The third 
relation is obvious. Lastly, we need to verify that 9-, defined in (|26|) . satisfies > 0. For 
this, first note that the logarithm term is well defined. Indeed, for c > 1, the argument is clearly 
positive. For c < 1, by Property Q] (iv) in Appendix[Ql <5 (o- 2 ) 2 (l + 5 ((T 2 ))~ 2 = (c— <T 2 5o(cr 2 )) 2 < 
c 2 , with the inequality arising from Property [TJ (i) and (ii) in Appendix [Ql this then implies 
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that the argument is greater than 1 — c > 0. Obviously, in both cases, as the argument of the 
logarithm is less than one, the logarithm itself is negative. This implies that 

(74) 

W tx 4 Mtx 2 )(l + ^ 2 )) m , 

l-c + a 2 (l + 5 (a 2 ))+a 2 5 (a 2 ) 1 ' 

{b) ca 2 (l + 5 (a 2 )) 

= c ■ — 2 . 9 , (76) 

C >' c fl - (77) 



a 2 + a 2 c, (c^ 2 ) 



(78) 



where (a) follows from the definition of S' (x) in Property [T] (w) in Appendix iDl (b) follows 
from Property Q] (Hi) in Appendix O and (c) is due to Property Q] (z'z) in Appendix O which 
implies that > <J 2 - 

B. Proof of Theorem [7J 

Part (i) of the theorem is an immediate corollary of [|33l Theorem 1] whose statement is 
provided in Theorem |6] of Appendix ID] The main technical difficulty is to prove Part (ii). Our 
starting point is the following proposition which relates the optimal average error probability 
F e (r\/3,c, r) to the statistics of the mutual information density and which is proved by standard 
arguments of information spectrum methods. 

Proposition 1 (Bounds on the optimal average error probability): Denote X n = (x±, . . . ,x n ) G 
C^ xn , with Xi = (xu, . . . , x Ki ) T . For each n = 1, 2, ... , and each r, v > 0, define S^y C 
V(€ Kxn ) by 

Sl v = h xn = ]jF Xki | ^^Var(|x fcl | 2 ) < „, -Le [tr (X«(X«) H )] < r| . (79) 

Then, the following two statements hold for the optimal average error probability for the second- 
order coding rate r: 

(i) Lower Bound: Let X n be an arbitrary random variable with probability measure Px™ € 
V(S'y), and let Y n denote the random variable associated to the output of the channel 
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corresponding to the input X n and fading H n . Then 

F e (r\f3,c,T)>¥(r\f3,c,T)^ 



inf sup lim lim sup Pr 



^U log Q4^) C J 



<r-£ 
(80) 



where Q n is an H n , X n -measurable random variable taking values in V(C Nxn ). 
(ii) Upper Bound: There exists a codebook of size M n with codewords of block length n that 
together with the MAP decoder form a (P e , M n , r)-code C n such that, for all real r, 



P e (r|/3,c,r)<G(r|/3,c,r) 



inf lim lim sup Pr 



^Uf log p yB , gB (dy»|g") c J- r + 



^n->0 

(81) 

Proof: The proof of Proposition \T\ is provided in Appendix IA-CI ■ 
The main problem in studying the optimal average error probability lies in the difficulty to 
perform any analytical calculus on the information spectrum of ^y n \x n ,H n , unless the underlying 
distributions (of X n , Y n \X. n ,H n , or Y n \H n ) are Gaussian. Proposition \T\ precisely handles this 
difficulty. Indeed, first note that the lower bound (l8~0l can be further bounded by the same 
expression with Q n chosen arbitrarily. In the proof, we shall then take Q n Gaussian with 
appropriate mean and variance. As for (1811 , note that it allows one to derive upper bounds 
on the average error probability for codes with inputs in S'£ by means of a larger set of codes 
with input distributions in v ■ In particular, it presently allows us to consider Gaussian X n to 
obtain the sought for upper bound. Without this result, Gaussian inputs would have a probability 
1/2 not to fall within in the large n limit, which would bring an additional factor 2 in the 
upper bound of Theorem [T] 

With this remark in mind, we now proceed to the derivation of the lower and upper bounds 
of P e (r|/3, c) = P e (r|/3, c, 1) as given in Theorem Q] (ii). 
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1) Proof of the lower bound on the optimal average error probability: From ([801 ) with T = 1, 
we have in particular 



¥ e (r\B,c)> inf lim lim sup Pr 

{Px«}~ =1 Oo (/3jC) 
F x «eV(S n ) n — — too 



■f 1 ¥ Y n lX n iHn (dY n \X n ,H r ' 
^W° g QjdY^) 



C) < r-£ 



(82) 



where, for fixed H n , Q n is taken to be complex Gaussian with zero mean and covariance matrix 
-^H n (H") H + <7 2 Iat (i.e. the distribution of F n |H n in the model (flOl) obtained from independent 
complex Gaussian inputs x t , t — 1, . . . , n, with zero mean and covariance 1^). From (fl"3l with 
r = 1, we therefore obtain 



FJr\B, c) > inf lim lim sup Pr VnK (l$ n K - C) < r - £ 
{P X neP(5")}- =1 ao WiC) L v N ' K ' 

n ' — ^oo 



(83) 



inf lim lim sup / Pr V^K - C) < r - f P X n(dX n ) (84) 
? x-.eP(S")}» ! ao (/3 , c) ,/s« L v JV ' K ; J 

n — ' — too 



where we defined 
1 

K 



= -^logdet( Ijv4 



n l Zjn\ H 



1 # n (if 

^2 



nK 



rnf rxn\ H 



— — + 



-i 



x. n + w n 



W n {W n ) H 
(85) 



and 



K 



H tr 



n I rjn\ H 



- v 1 i , / . 1 H n (H 
i nk = — logdet ( I N ' 



a 2 K 

H n( H n\H 



K 



+ o 2 I N 



x n + w n 



H r 



K 



x n + w n 



n ffirn\ H 



W n (W 



(86) 



Note that the only difference between I^ n K and I^ K is that in the former expression a specific 
realization X™ of X n is assumed while in the latter X n is a random variable. Additionally, for 
X" G <C Kxn , define 



A n = 1 K - -X n (X 



n l -vn\ H 



n 



(87) 



For X" G iS n , we have in particular < -^tr A n < 1. We also denote A n the random variable 



n f ~v~n\ H 



A n = l K - -X n (X 

n 



(88) 
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which, for F x ™ E V(S n ), satisfies < itrA n < 1 almost surely. 

In the following, we analyze in detail some of the asymptotic properties of I^ n K and I^ n K as 



n 



» oo. This analysis makes extensive use of advanced tools from the field of random matrix 



theory, and most particularly of the characteristic function approach due to Pastur, see, e.g., [|33l . 
along with the so-called integration by parts formula for Gaussian vectors (Lemma [8]) and the 
Poincare-Nash inequality (Lemma 0) which are recalled in Appendix [Ql In contrast to the usual 
setting of large random matrix theory, the matrices A n are only (almost surely) bounded in trace 
rather than in their spectral norm. This complicates the analysis at many occasions. 

The first result demonstrates that for a fixed X n , I^ n R has a mean close to C + Co^tr A™ 
(with Co defined in Section IIV-AI) and a variance of order itr (A 



n\2. 



Proposition 2: Let {X n }^ 1 be a sequence of matrices with X n £ cS n and denote A r 
l K - ^X n (X") H . Define lf' K as in ([85]) and Co as in ((65]>. Then, 

2" 



E 



C + Co^trA' 



O \ l + -tr(A n ) 2 



(89) 



Proof: The proof is provided in Appendix IB-DI ■ 

Next, we show that, under some conditions on X n , I§" K , when properly centered and scaled, 
converges weakly to a standard normal random variable: 

Theorem 3: Let Co, Ci> Cx, (2, CiO), and 6_ be defined as in ©-(EU) and d26j). Let {X n }~ =1 
be a sequence of random variables with probability Px»* G V(S n ) and, for A n = Ik — \X n (X n ) H , 
define 6 n the random variable given by 



dr. 



K 



-MA n 



C 2 ^tr(A™) 2 



(90) 



if the argument of the square root is nonnegative or n 



1 otherwise. Suppose further that 



there exists 77 > such that, for each n,Q\>r\ almost surely. Then, for any real x, as n 



{PA 



Pr 



Or, 



< X 



-> 00, 
(91) 



Proof: The proof is provided in Appendix IB-BI ■ 

Equipped with the results above, we are now ready to proceed with the main proof. This 
is an adaptation of the converse proof by Hayashi in [11 J to our channel model. Take s > 
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arbitrarily small satisfying sup^^i |Ci(#)| < # 2 which always exists since C,i{ x ) ~ ► as x — > 
and Q 2 _ > 0. This condition will be needed later on. Further define the sets 



V" 



(V 



< — trX n (X n ) H < 1 - e 



nK 



e < 



nK 



:trX n (X n ) H < 1 



X n 



e < — tr A" < 1 
K 



< ^trA™ < e 
K 



(92) 
(93) 



Take P X n g and f > arbitrary with f < |r| if r ^ 0. Then 

Pr v 7 ^ (l% n K -C)<r-t] Fx4dX n ) 



Pr 



V." 



+ 



Pr 



(V?) c 



(I$" K - C) < r - f 1 P X n (dX" 



(94) 



We shall treat both right-hand side (RHS) terms individually. Intuitively, if X n G V™, the 
energy of the input codeword is too weak to achieve a rate C + r/VnK, and therefore we will 
show that the integrand of the first term of the last equation tends to 1. If instead X™ G {V™) c , 
by properly setting e, the second term will be non-trivial. The need to isolate both terms comes 



from the fact that the random variable y/nK(I^ n K — C) has a variance that is neither bounded 



away from zero nor bounded from above if X" G V™ as n 
needs to be controlled separately. 



(/3,c) 



> oo; hence the leftmost term 



Set : Consider first a sequence {X"}^ with X n G V" for each n. For these X n , ynK(I^ n K — 
C) has a diverging mean. Inspired by Proposition |2] it is natural to introduce the centering term 



ynK^Q^ix A. n . With this idea in mind, we have the following chain of immediate inequalities: 



Pr 



VnK(lj; K -C)<r-t 



= Pr 
> Pr 



N,K 



1 

'K 
1 



C + Co- tr A" ) < r - £ + ( \/ — tr A" 



Y.,v-C + Co^trA' 



<r-e + Coi/^trA" 



Pr 



v^fe-C + Co^trA"^ 



>r-e + Co A /^trA f 



(95) 
(96) 
(97) 



Since X n G V", it holds that trA n > Ke. Thus, since Co > 0, for sufficiently large n 
(independent of X n ), 



/ nn 



(98) 
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For these large n, we can therefore apply the Markov inequality to obtain: 



Pr 



1 



>r-£ + CoA/^trA™ 



E 



< 



v^^-C + Co^trA-) 



(r-£ + CtVf ttAn )' 



E 



< 



v^^-C + Co^trA") 



(99) 



(100) 



(r -£ + CoVnKe 

Applying Proposition |2] to (|99l) and using -^tr(A n ) 2 < K (since -^trA™ < 1), we show 
precisely that 

— " M 



Pr 



v^fe-^ + Co^trA^ 



>r-£ + CoW^trA' 



< 



n 



(101) 



for some positive constant M which does not depend on X n and n. Using this result in (|97 
leads to 



Pr 



v^(Ck-C) <r-e 



M 

> 1 

n 



from which we can finally conclude that 
Pr 



v.? 



(l^ K -C)<r-t Px-(rfX") > ( 1 - — J P X n (V £ n ) . 



(102) 



(103) 



Set (V") c : Assume first that Px«((V") c ) > and define X n a random variable with probability 

for a Boi 



measure G "P((V") C ) given, for a Borel set A, by 

Px-(^n(v £ 



IT) 



Px»((V £ 

To X n , we associate the random variable A n = Yk — ^X n (X n ) H . 
Let now X n G (V £ n ) c and, for A n = l K - iX"(X") H , let 

(O 2 = f- + Ci (itr A"] + C 2 -^tr (A") 2 . 



(104) 



(105) 



The term (9^ n ) 2 naturally intervenes as the approximated variance of \JnK{I^ n K — C+( -^tr A n ), 
as evidenced by Theorem[3] By definition of e and (V") c , |Ci(^tr A n )| < sup^Q^ |Ci(£)| < # 2 • 



Since C2 > 0, (0*") 2 then satisfies 



(C ) J >^~ sup |Ci(x)|>0. 

ase[0,e] 



(106) 
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Since this holds for each such X" and for all n, denoting 



^ 2 +Ci(-trA" 



C 2 -^tr(A") 2 



we have 



e 2 n > e 2 _ - sup \Ci(x)\ > o 

xE[0,e] 



(107) 



(108) 



almost surely, so that the variance is uniformly bounded from below. For definiteness in the 
following, we may set 9 n = —1 on the set of probability zero for which 9 2 n = and we denote 
otherwise 9 n the positive square root of # 2 . Centering again I* n K — C with (o^trA n , we can 
therefore apply Theorem [3] to obtain 



V^K lf K -C + ( -trA n ) < 



9,i 




^(/l^-C + Co^tri") <0 

r < 



r < 
r > 



(109) 



(HO) 



(HI) 



for some sequence i n \. 0, where (a) holds since 9 2 > 9 2 _ — sup^Qj |Ci(^)| > almost surely 
and since we took r — £ > for r > 0, and (b) is a direct consequence of Theorem [3] The 
term 1/2 arises from $(0) = 1/2 which originates from 9 n not being bounded from above since 
-^tr (A™) 2 can grow like 0(n). Note the important role played by e to control a lower bound on 
9 n . Without the space separation between V™ and (VT) C , the relation (a) may indeed not hold 
{p — liminf 9 n , as defined in [6], may even be zero for a certain sequence {1"}™ =1 but this is 
difficult to verify as little can be said about (I and (f). 
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We can now conclude that 
Pr 



(yn )c 



(I*" K -C)<r-£ d¥x4dX r 



(«) 
> 



/ Pr 









K 



(6) 



> 



ffiV((V £ ") c ) | Pr v^^-C + Co^trA-J <r-£ 
P X n((V e n ) c ) Pr 

f .... I ... \ 



rfP X n(c/X n ) 

dPx„(dX n ) 



(112) 
(113) 
(114) 
(115) 



Px„((V £ ") c ) $ 



r < 



(116) 



Px«((V £ ") c )(i + £ n ) , r>0 

where (a) follows by the positivity of £ tr A n , (6) uses the definition of P^«, and (c) follows 
from dm). 

If P X ™((V") C ) = 0, the result obviously still holds. 
Gathering the results: From (|103l) and (I116I ). we then have 



Pr 



Pr V^tf (ij^ - C) < r - £ rfPx«((iX n ) 



+ 



Pr 



(V n)c 



v 7 ^ (/^f - C) < r - d d¥ X n(dX n ) 



(117) 



Px»(V?) (l-f)+Px"((V £ ") c ) U> 



+ 



r < 



(118) 



> 7 \ \\/ - _sup ^6[o, £ ]ICi(a;) 

Px«(V-) (1 - f ) +Px-((V £ ") C ) + , r > 0. 

Since < 1 for all real x, i n | 0, and P X n(V £ n ) + P X "((V £ n ) c ) = 1, taking the limit superior 
as n > oo of the above equation leads to 



lim sup Pr 

(|S,<0. 
n r oo 



> 



r < 



(119) 



, r > 0. 

By continuity of $, we can freely take the limit £ J, on the right- then left-hand sides. As e 
can be taken arbitrarily small, it follows again by continuity of $ that 

, r < 



lim lim sup Pr 

£4-0 (/3, c ) 

n ' — ^oo 



> 



(120) 



r > 0. 
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Equation (11201) is valid regardless of the choice of the sequence {fx n £ V(S n )}^ =1 . This 
therefore implies 



F e {r\/3,c) > 




r < 



(121) 



, r > 
which completes the proof. 

2) Proof of the upper bound on the optimal average error probability: From (|8l"l) with T = 1, 
we have in particular 



P e (r 1 , c) < lim lim sup Pr 

£4-0 o, c ) 

n ■ — )-oo 



V nil I — — log ■ 



nK 



F Yn \ H "(dY n \H r ' 



C)<r+Z 



(122) 



where, denoting X n = {x u )i<i<K,i<i<n> we define P x « = Y\k= i IiLi P* w witn = ^ for 
x ~ CJ\f(0, 1) (recall that F Y ^\H n is the output obtained from input X n ). We then have from 
A3 



P e (>|/?, c) < lim limsup Pr [l^ K - C) < r + £ 

€40 (|9>c) L 

n roc 



(123) 



with defined in ( f86l) for this specific choice of Px« • 

In contrast to the proof of the lower bound, the matrices X n are standard complex Gaussian 
matrices which are neither confined to be in S n nor have bounded spectral norms. For this 
reason, we require the following theorem in order to proceed: 

Theorem 4: Let 9 + be defined in (l27l) and I§ n K be given by (l86l) with X n = {xij)i<i<K,i<j<n 
distributed as F x ™ = IlifcE'W where P^. = F x , x ~ CAf (0,1). Then, for any real z, as 
n y oo, 



Pr 



(i§: K -c)< 



(124) 



Proof: The proof is provided in Appendix IB-CI 
Applying Theorem 0] to (11231 ) leads to 



lim sup Pr 

n roo 



- C) < r + £ = lim sup Pr 

n too 



(125) 
(126) 
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which, along with the fact that 



concludes the proof. 



lim$ 



r + £ 



(127) 



C. Proof of Theorem [2] 

The result follows from a close inspection of the steps of the proof of the upper-bound part 
of Theorem \T\ (ii) in Section ITV-B2I Applying Lemma Q] in Section ITlI-BI for A n = S n , we have 
that codes C n with input distributions F x ™ E S n satisfy for each 5 > 



Pi n \C n ) < Pr [l£ K <R n + S]+ exp(-nKd) 



(128) 



which is obtained by taking log 7 = -4? logM n + S = R n + S in (12071 ), with Ijy^ defined 

in d86j). Now, denoting X n = (a;fci)i<fe<^,i<*<^ we take Px« = rii^^w and = p x with 
x ~ CjV(0, 1). From Theorem |4] in Section IIV-B21 for any x E R, we know that 



Pr 



N,K 



C)<x 



<I> ( |- I > 



(129) 



as n < ' /3,6 " > > 00. Before we can proceed, we need to show that the convergence in (11291) also holds 
uniformly over x E JR. To see this, take e > and let xi < . . . < xu be such that 



^{e- + l x M ) >l-e 



(130) 
(131) 

^(fl+^i+i) - $(0^) < e, z = 1, . . . , M - 1. (132) 

This is always possible since $(6'+ 1 x) is monotonous in x, continuous, and bounded. From (11291 ) 
applied to xi, . . . , xm, we also have that for all large n, 



Pr[vW(/^ -G)< x x \ - Qie+W, 



Pi[v^K(I§; K -C)< x M ] - $(0+ x xm) 



Vx[ Xi < V^K(I§" K -C)< x l+1 ] - - HK 1 ^)) 



< e 

< e 



< e, i = l, 



(133) 
(134) 



M-l. 



(135) 
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and then, gathering the two sets of equations, we have for these large n 



Pi[V^K(I§" K -C)< x M ) >l-2e 
Pr[xi < V^K(l£" K - C) <x i+ i] <2e, i = 1,...,M-1. 



(136) 
(137) 
(138) 



Since, for any x G R, there exists i G {0, . . . , M} for which X{ < x < x%+i, where xq = — oo 



and xm+i — oo, and since both and Pr VnK(I§" K — C) < x 

find that for these large n, 

' x 



are monotonous, we 



sup 



Pr 



< 2>e + max 

Ki<M 



< Ae. 



Pr 



(139) 
(140) 



Taking the limsup on the left-hand side (LHS) and using the fact that e > is arbitrary, we 
finally have 



sup 



Pr 



^K(Ink -G)< 



x 



*" 1 i 



-> o 



(141) 



as n — — > oo. In particular, for any sequence {i n }™ =1 , x n G R, 



Pr 



Vn#(J#W -C)< 



.1 M 



< sup 

xeR 



Pr 



C) < x 



as n oo. Choose in particular x n = VnK(R n — C + <5*) with 5* n defined as 

8* = arginf {$ (e^V^K(Rn - C + <5 n )) + e"^ 5 "} . 
Standard calculus confirms that (|58l ) corresponds to this definition. We then have 



(Pr K> < R n + S* n ] + e- nKS ") - ($ (Vv^i^. - C + 5;)) + e- nX5 " 
as n 00. Letting similarly 5° be one of the arguments of 





(142) 
(143) 
-»■ (144) 



inf {Pr <R n + 8]+ e~ nKS } 



(145) 
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we have 

(Pr K> < R n + 5° n ] + e- nKS ") - ($ (0- + l ^K(R n -C + <£)) + e~ nK5 -) -> (146) 

On the other hand, it follows from the definition of 5° that 

(Pr <R n + 6* n ] + e~ nK6 «) - (Pr [l*% < R n + 5° n ] + e~ nK ^) > (147) 

for each n, and, by definition of 5*, 

($ (e- + l ^K{Rn - C + 5;)) + e- nK5 ~) - ($ (^ 1 v / ^( J R„ - C + 5°)) + e" n ^) < 

(148) 

for each n. Combining (11441) . (11461) . (11471) . and (11481) finally gives 

(Pr [4> < R n + 5°] + e- nia ") - ($ (e; 1 ^!^ - C + 5;)) + e-" x5 '*) -> (149) 

G8,c) v 

as n )■ 00. 

Therefore, 

P e (n) (Cn) < Pr < R n + S° n ] + exp(-nK6° n ) = $ (e- + l V^K(R n -C + 6* n )^ + e~ nKS - + £ r 

(150) 

for some sequence £ n — > 0. Since F^\r) < Pe n \C n ) by definition, this concludes the proof. 

V. Summary and directions for future work 

Using information spectrum methods and Gaussian tools from random matrix theory, we have 
studied the second-order coding rate of the MIMO Rayleigh block-fading channel. In order to 
give a meaning to the otherwise ill-defined second-order coding rate for block-fading channels, 
we have considered the asymptotic regime where the channel dimensions as well as the block 
length grow infinitely large at the same speed and the code rate is a perturbation of 0(1/ y/nK) of 
the asymptotic capacity. For this setting, we have derived closed-form upper and lower bounds 
on the optimal average error probability which depend solely on the most important system 
parameters. By a Gaussian approximation of the mutual information density and a generalization 
of Feinstein's lemma, we have further obtained an error probability upper bound for finite system 
dimensions which has been shown to be accurate for short block lengths and realistic numbers 
of transmit and receive antennas. This bound exhibits a surprising similarity to a general scaling 
law for iteratively decoded graph-codes conjectured in [|20l . 
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The proposed method to study the asymptotic statistics of the mutual information density for 
MIMO channels is new and could be further applied to other settings of interest. These comprise, 
for example, (i) the block-fading regime where coding is done over multiple coherence blocks, 
(ii) the derivation of error-probability bounds with imperfect channel state information which 
would enable the study of the trade-off between channel training and data transmission, (iii) 
the derivation of CLTs for the mutual information density with linear receive filters, and (iv) a 
further investigation of the relation between the scaling parameters conjectured in [20] and the 
information spectrum. 
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Appendix A 
Results on information spectrum 

In this appendix, we provide and prove several important results related to the mutual infor- 
mation density which are (indirectly) needed for the proofs of the main theorems in Section Hill 
In the first part, Appendix IA-A1 we prove three auxiliary results: Lemma [2] establishes the 
asymptotic non-negativity of the Kullback-Leibler divergence which is needed for the proof of 
Proposition |3] in Appendix IA-AI Lemma [3] provides a lower bound on the average probability of 
error for codes of arbitrary length with inputs issued from F X n € S n (F) and will be required in 
the proof of Proposition \T\ in Section IIV-BI The third result, Proposition |3j shows that the upper 
bound on the optimal average error probability G(r\f3,c, F) as introduced in Proposition Q] in 
Section HV-BI is non- increasing and convex in F. These properties will be also needed for the 
proof of Proposition [T] 

In the second part, Appendix IA-BI we present the proof of Lemma CD in Section IIII-BI which 
provides an upper bound on the average error probability of codes of finite length, generated 
from an arbitrary input distribution. This lemma is is needed in the derivation of Theorem |2] in 
Section HV-Cl 

The third part of this appendix, Appendix IA-C1 contains the proof of Proposition \T\ in Sec- 
tion which is the cornerstone result for the proof of Theorem CD in Section [TlTJ This results 
provides a lower and an upper bound on the optimal average error probability in the second-order 
coding rate P e (r|/3, c, F). 

A. Auxiliary results 

The first lemma will be needed to prove Proposition |3] in Appendix IA-AI The lemma states 
the asymptotically non-negativity of the Kullback-Leibler divergence ll34l . 

Lemma 2 (Asymptotic non-negativity of the divergence): Let {P™}^ and {Q n }™ =l be two 
arbitrary sequences of probability measures on C JVxn , X n be a random variable over C JVxn , and 
let x > 0. Then 

liminf Pr 

(|9,c). 
n roo 

Proof: We define the set 

A. = {x e C» | ^ log jgg < and r « Q»} (152) 



1 , F n (dX n ) 
log , > 



X 



nK 



l (dX r 



(151) 
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which satisfies 



Pr [An] = / P n (dX) (153) 

J An 

<exp(-V^Kx) I Q n (dX) (154) 

J An 

< exp(-V^Kx). (155) 



Then, by taking the limit inf as n ^'°\ oo, we obtain 

liminf Pr L4J = 0. (156) 

CM 
n too 

This concludes the proof by noting that the set of X 6 C Nxn not in A n for which P n < Q n 
does not hold is (contained within) a set of zero measure. ■ 

We now state a variation of Verdu-Han's lemma lTT5l which appears to be more adequate to 
characterize the second-order approximation of the error probability. The proof is very similar 
to that in |[T5l which itself is based upon the results in 11351 . ifToll . Note that a similar result was 
already used in [fTT| without an explicit proof. 

Lemma 3 (Variation on Verdu-Han's lemma): For integer n > 1, let X n be an arbitrary ran- 
dom variable uniformly distributed over the set of M n messages issued from M n realizations of 
Px« £ V(Sfi), and let Y n be the output random variable of the channel F Y ™\x n ,H n corresponding 
to the input X n and the random fading H n . Then, the average error probability of such a 
(Pe n \ M n , r)-code C n must satisfy 



Pt\ C n) > sup sup IPr 

7>0 {Q„}^° =1 I 



F Y n lX n >H n(dY n \X n ,H n ) 



where Q„ is an H n , ^"-measurable random variable valued in V(C Nxn ). 

Proof: Let 7 > and let M. n = {l,...,M n } denote the set of uniformly distributed 
messages. Consider a (P e , M n , V) -code C n defined by a codebook {X", ...,X^ } together 
with a partition of <C Nxn into disjoint decoding regions {£>H'\i, • • • , ©H n ,M n } for each fading 
channel H n . For an arbitrary Q n as defined in the statement of the lemma, define the set of 
messages and output sequences 



Ar« = < (m, Y r 



Pvii x n H n ( uY n XIL , H 71 ) , 1 

log Q n (dYn) - bg7 and P ^l^.^( • ' X -' H ) « 



(158) 
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We have to prove that 



Let us define the following sets: 



Pr[A H n) <pW(C n ) + 



7 



(m) 



Y"GC 



P y »| X » H ,(dY»|X£,H») < Q„(rfY n ) 7 



(159) 



(160) 



(in the latter and below, Q n is implicitly understood as the value in 7 ? (C JVxn ) taken by the 
random Q n for X™ , EP fixed). Then we have the following chain of inequalities: 



Pr [AhA = E 



E 



— V>Y«\xn,m {B Hn (m) n V c Hn jX n m , H n ) 



m=l 

M„ 



— Vm\ X n h» (B H »(m) n IV^R, # n ) 



m=l 



< P e (n) (C„ 



E 



< ?i n '(C n )+E 
( J P e (n) (C n ) +E 



m=l 



m=l 



M„ 



M 



\m=l 



< P (n) fC ) 4- — 



(161) 

(162) 

(163) 

(164) 
(165) 



where (a) follows from the assumption of uniformly distributed messages over the code C n , 
(b) follows from the definition of error probability, (c) follows from the definition of the sets 
{Bu^(rn)} m eM n ^ an d (d) follows from the assumption that {D H ",i, • • • , T>n^,M n } are disjoint 
sets. The proof of this lemma is concluded by noting that 7 > and Q n are arbitrary, and that 
if Pyn| X „ iH n(- |X",H n ) < Q n does not hold for each (X",H n ) G € Kxn x C NxK , then we 
simply define a set in <& Nxn x <& NxK for which we take 



log 



Py™|X™ ,H n 

(dY n \X n ,H r 

Qn(rfY«) 



OC 



(166) 



yielding Borel subsets of An n which either have zero measure, or are subsets of sets with zero 
measure. ■ 
We now state and prove the following proposition, which is central to obtain the upper-bound 
part of Proposition [Q 
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Proposition 3 (Second-order coding rate property): Let T > and let G(r|/3, c, T) be defined 



as 



(r\B,c, r) = inf lim lim sup Pr 

{Px«}- =1 ao ((9 , c) 



(167) 

with S T l u denned in (|79| ). Then, the following two statements hold: 
(i) For every tuple (r, /3, c), G(r|/3, c, T) is non-increasing in T, 
(ii) For every tuple (r, 6, c), G(r|/3, c, T) is a convex function in T. 

Proof: From the definition of G(r\(3,c, T), for any r" > T and any v n > 0, since (Sp^ C 
<Sp, , it is clear that G(r|/3,c, T') < G(r|/3,c, T). Therefore, G(r|/3,c, T) is non-increasing in 
T, proving part (i). 

To prove part (ii), we will show that for every tuple (r, f3,c), any number < A < 1, and 
every pair of positive numbers i\ and T 2 , 

G(r\p,c,T{\)) < AG(r|/3,c,r 1 ) + AG(r|/3,c,r 2 ) (168) 

where A = 1 — A and f (A) = Ar\ + Ar 2 . Since this is clear for A G {0, 1}, we may assume 
< A < 1. Let {P X n}™ =1 G and {Px 2 4~=i e S^ 2 Un be two arbitrary sequences of 

probability measures satisfying the power constraints Y x and T 2 , respectively. 

Also define F Y «\H n , i = 1,2, as the output of the channel ^Y n \x n ,H n associated to the input 
Px™ and Px™y 4 n |H™ as the joint distribution of X"Y { n given H n . Let then X n be the mixed 
random variable with probability measure 

pg = APx«+APx« (169) 
and define the joint probability measure 

= [ F YnlX n !H n(B\X,U n )Ff n (dX) (170) 
J A 

= A I F Y n lXn>Hn (B\X,W l )F x ,(dX) + ~X [ F Y n lX n iHn (B\X,H n )F X n(dX) (171) 

J A J A 

= \F X n Y n lHn (A x B\H n ) + \P X nyn ]Hn (A X B\H n ) (172) 
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for every Borel set (A,B) G <C Kxn x C Wxn , where Y n is the output generated by X n with 
distribution P§j H „. 

From the above assumptions, we have 

-Ll [tr(X"(X") H )] = A E [tr^^f)] + A E [tr(X 2 "(X^) H )] (173) 



< \Y 1 + Ar 2 
= f(A). 



(174) 
(175) 



From these considerations, for r real and £ > 0, we now obtain the following chain of 
inequalities: 



Pr 



(dY n \X n ,H r - 

V nK — — log 



nf\~ 



-C\ <r+£ 



(a) 

< A Pr 



v^(^i g p ^;r w " |x ^ w) -c 



+ A Pr 
+ A Pr [B liH A + A Pr 



(176) 



(b) 

< A Pr 



' / 77 ( 1 1 Pyn^n.gn^lXr./r) £' A 



VnK ( — — log 



£ > Y n \X n ,H n 



^ A F 1 V \UK ^ Pyn^n (dy* | if* 

+ A Pr [B^n] + A Pr [B 2 ,^] 



C 



<r + £ 



(177) 



where 



step (a) follows from the definition of Pjfi y „| H „ 



JVxn 



and from the definitions of the sets 



1 P rm „(dY|H") 
■ log —rk < 



nK P$> (dY|H») - V^f 



2,H" 



Y G C 



1 P 1 y| flW (dY|H n ) 

lOg -7^ < 



P$! |fln (dY|H») ~ 
for some £' > 0. Note here that Fy n \H n 

~¥ > Y n \H n 

is ensured by definition of Y n , 
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• step (6) follows from the definitions of the sets B\ Hn and B^ Hn . 

From the definition (11671 ), taking the superior limit over n on (11771) , then taking £ 4- and 
£' 4- on left- then right-hand sides, and finally the infimum over {Pjjn £ 5^ on the 
LHS, we obtain 

G(r\B,c,t(\)) 

f — loir p y"l^g"( rfy 2 n |*2»# n ) _ r \ <_,*,*/ 



< A lim lim lim sup Pr 

n — ) oo 



+ A lim lim lim sup Pr 

n — ^oo 



(178) 



where we used Lemma|2]in Appendix IA- Al to show that lim sup Pr [Si^™] = and lim sup Pr [0 2) #n] 



C9,c). (/3,c), 

~ TOO 



0. Calling £" = £ + the limits over £' 4- and £ | can be compacted in a single limit over 

Taking now the infimum over the sequences {P^f }^=\, Pxf G «Si\,i/„> and {Px^j^li, Pxj G 
«Sp 2 „ n , on the first and second terms of the RHS of the last equation, respectively, gives 

G(r|/3, c, f (A)) < A G(r|/3, c, r x ) + A G(r c, T 2 ) (179) 

which concludes the proof. ■ 

B. Proof of Lemma \B in Section \III-B\ 

Let Ai n = {I-,---, M n } denote the set of messages which are to be assumed uniformly 
distributed and let Px« G A n be an arbitrary probability measure. Let {X^} meMn be independent 
random variables associated to the same probability measure Px»- The encoding function cp is 
defined by the mapping <p : m i-)- X™ for every m E M n . Note here that X™ and F n are 
distributed according to the joint probability measure Pjcy • 

Furthermore, the random variables 
(X™, Y n ), with F n the output associated to X™, are independent of {X^,} m /^ m . Consider the 
maximum a posteriori (MAP) decoder 0h« : <C Nxn !->■ ,M n U {e} that, for the fading channel 
H n and for a given received Y n G C Arxn , maps Y n to the message m E Ai n if 

P X n|yn jH n(X m |Y n ,H n ) > P X n | yn >ff n (X m / | Y" , H") for all 771 ^ 771 (180) 

and, if no such m E M. n exists, maps Y n to the error event e. 
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From the above definitions, we obtain an ensemble of block length n codebooks of size M n that 
together with the MAP decoder form an ensemble of codes C n with average error probabilities 
Pe n \C n ), where the codewords of each code C n are realizations X 1? . . . , X Mn of the random 
variables X{\ . . . , Xfo . Let E[P e (n) (C n )] denote the average probability of error (over messages 
and codebooks) incurred over the channel Py«|X",H" by the decoder defined above. We study 
next an upper bound on this average error probability. Let 7 > 0. Then, 



E[PW(C)] 



( = } / Pr 



maxPyn| X „^n(dY|X™,,# n ) > Pyn| X n(dY|X, H n ) 



F X n Y n(d~X,dY) 



(181) 



+ / Pr 



< Pr [B c Hn ] + / Pr 

(M n - 1) 



maxPyn| X n iH n(dY|X",,# n ) > F Y n lX n, H n(dY\X,H n ) } D B H n 



Py"|X",H"(^Y|X ,,H ) 

max b / —. TT : > 7 



< Pr + 



7 



(e) 



Pr 



p y „i Xn)H „(dy»|x n , J ff n ) ^ 

l0 S TO (AVn\Tjn\ ^ lQ S 7 

F Y ™\H n \dY n \H n ) 



+ 



7 



P X nyn(rfX,rfY) 

(182) 

P X n y „(rfX,rfY) (183) 

(184) 
(185) 



where 

• step (a) follows from the definition of the MAP decoder, the uniform distribution of the 
messages, and from the fact that by construction the variables {X^ t } rneMn have measure 

• step (b) uses the definition 

Py™|x™,_H"™(rfY|X, H n ) 



;x,y) 



log 



Py»|^n(rfY|H n ) 



> log7 and Pynix™ff»( • |X,H n ) < Pyn,^n(-|H n ) ^ , 



• step (c) follows by noting that for all (X, Y) e -Br™, we have that 

Py«|X", J H'™(^Y|X, H n ) 
Pyn|tf»(rfY|H n ) > 7 

provided that V Y »\xn,H»(- |X, H n ) < Pyn|^n(-|H n ), 



(186) 



(187) 
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step (d) follows from the union bound and Markov's inequality, which yields 



Pr 



max ■ - : > 7 

mYm Pyn| H n(dY|# n ) 



F Y u lx ^ Hn (dY\X^,H n ) 
F Y n\ H «{dY\H n ) 



< 



(M n - 1) 

7 



>7 
(188) 
(189) 



where m' G {1, . . . , M n } \ {m}, 

step (e) follows from the definition of Bh™ and noting that if KV™ix™,.ff n ( • |X, H) <C 
(•|H) does not hold for each (X, H) G (rj-^xn x C^^, then we can define a singular 
set 8 c C^ xn x <C NxK on which 

(dF n |X,H n 



F Y n\ Hn (dY n \H r 



OG 



(190) 



yielding in (11851 ) Borel sets which either have zero measure, or are subsets of sets with 
zero measure. 

By noting that 7 > is arbitrary, (|185l ) and the standard random coding argument conclude the 
proof of the lemma. 



C. Proof of Proposition U\ in Section \IV-B\ 

This proof uses several intermediary results, mostly adapted from classical theorems of infor- 
mation spectrum theory which are provided in Appendix lA-AI We first prove the lower bound in 
(f8~0l) . From Lemma |3] in Appendix IA-A1 we know that for any random variable X n uniformly 
distributed over a (Pj" \ M n , T)-code C n whose probability measure satisfies F x « G "P(«Sf), the 
average error probability must satisfy 



P^ n \C n ) > Pr 



log- 



F Y n\XK iH "(dY n \X n ,H r ' 



< log 7 



7 



(191) 



for each n = 1, 2, . . . , 7 > 0, where Q n is H n , X n -measurable and takes values in V(<C Nxn ) 
Let us choose 7 as 



1 



log 7 



nK nK 
for some £ > 0. We now set the coding rate 

1 



1 £ 
logM n - 



nK 



\ogM n = C + 



(192) 



(193) 
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for some real r. Then, combining (|191| )- (|193l ), we obtain 

P^ n \Cn) > Pr 



, — — , 1 F Y ^\x n ,H n (dY n \X n , H r 
V nK log - 



nK 



UdY n ) 



-C) <r-e 



exp(— v nK^). 

(194) 



Taking the limit superior over n on the last equation, we obtain 



limsup P^ n \C n ) > limsup Pr 



n roo 



n roo 



•v/nif ( — - log 
nK 



1 F Y n\Xn )H n(dY n \X n ,H r ' 



(195) 



As this is true for each £ > and Q„ as defined above, we can take £ | followed by the 
supremum over Q n on the RHS of (11951) . Taking then the infimum over the codes on the RHS 
then LHS, we conclude that 



F e (r\(3,c,T)>¥(r\f3,c,T) 



(196) 



which proves part (i) of the proposition. 

We now prove part (ii) for the upper bound in (1811 . The following definitions are needed for 
the proof. For any number < S < T, we consider a sequence {F xn }^ =1 of arbitrary probability 
measures F x n = Yli=i Ylk=i ^« m me set ^r-<5 u n f° r some sequence v n — > 0, that is 

X -E xn [tr (r(X n ) H )] < T - 5, ^ Var (|x fcj | 2 ) < z/„ (197) 



CM 



as n — : — >■ oo. The fading conditional output probability measure induced by this input is denoted 
and the joint probability By Markov's inequality, we then have, as 



n > oo, 



Pr 



nf vn\ H "> 



nK 



tr(X n (X 



< r 



> 1 



— \— VVarQ 
5 2 (nK) 2 ^ 



1. 



(198) 



k,i 



(A4 



It follows from the above that p n = F X n (S£) — 1 as n — — > oo. Let us denote by F X n the 
probability measure induced by F X n over the set <Sp, defined for any Borel set B by 



F X n(B) = — F x JBnSf 

Pn 



(199) 



for which F X n (5f ) = 1. Take now n large enough so that p n > for all n > n . Then, for 
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arbitrary Borel sets B G € Kxn and V e C Nxn , and any H" e C NxK it holds that 

Fy n]H n(B\H n ) = [ F Y n lXn , Hn ( y B\X n ,H_ n )F X n(dX n ) 



= — F Y n lX ^ H n(B\X n ,H n )F X n{dX n 

Pn Js™ 

< — I Pyn, x „ H JB\X n ,H n )F x JdX n ) 

Pn J 

= — F YnlH n(B\H n ) 

Pn 



(200) 

(201) 

(202) 
(203) 



which implies Pyn|#n(-|H n ) <C F Y ni H n(-\H n ) and, similarly, the joint probability measure 
satisfies 

(204) 

(205) 
(206) 



F X n Y n\ Hn {B X V\W l ) = / Pyn| X n iH n(D|X",H")P X n(rfX r ' 

Jens™ 

< — I F YnlXnH n(V\X n ,ii n )F xn (dX r ' 

Pn J B 

= —F X n Y n\ H n(B X P|H") 

Pn 



which implies F X n Y n\ H n(-\H n ) <c F Xnyn \ Hn HH n ). 

From Lemma CD in Section HEH] for A n = P(S$), we know that there exists a (P e (n) , M n , T)- 
code C n whose average error probability satisfies 



P e (n) (C„) < inf ^Pr 



1 . F Yn]x ^ Hn (dY n \X n ,H n ) 1 

lOEC ' < l0£ 'Y 

nAT s Pyn| H n(rfF™|i/™) - n# . 



+ 



7 



(207) 



for every n — 1,2, ... , the probability being taken over X n which satisfies F^^S^) = 1. We 
will now relate P e (C n ) to the original distribution F Xn using (I199I ). To this end, from (12031 ) 
we obtain 



1 



log 



1 



11 1 , 
log — < — — log ■ 



'I 

for any Borel set B. Now set 

1 



nK & Py„, H „(8|H") nK ° p n ~ nK & F Y n {Hn (B\U 



1 



— log 7 = — logM, 



1 l 1 * 
log h 



nK p n y/n~K 



(208) 



(209) 
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for some £ > 0. Then, we have the following chain of inequalities: 



Pr 



1 , F Y n\ X ",H"(dY n \X n ,H r - 



nK 



log- 



( = } Pr 


1 


— lot 




nK 


(b) 


1 




< Pr 




— lo£ 




nK " 


(<0 1 




1 


< — Pr 




Pn 




nK 



¥ Y n\ H n(dY n \H n ) 
F Y n\xn, H "(dY n \X n ,H r - 

F Y n lH n(dY n \H n ) 



< 



1 



nK 



< 



log 7 



+ 



M„ 



7 



nK 



F YnlXn>H n(dY n \X n ,H n ) 1 



< 



log 



F Y n lHn {dY n \H n ) 

F Y n ]X n >Hn (dY n \X n ,H n ) 

F Yn \ Hn {dY n \H n ) 



nK 



< 



\ogM n 



\ogM n + 



1,1 i 
log 1 — - 

nK p n yfnK 



+ 



exp(— ynKQ 

Pn 

(210) 



Vn~K 



+ eM-vm 1 (211) 



Pn 



nK 



logM n + 



VnK 



+ 



exp(— \JnK£) 



Pn 



(212) 



where (a) follows by replacing (|209l ) in (I207I ). (6) follows from (|208l) . and (c) follows from (|206l) . 

Remark that, if F Y n\ x «,H n ( ■ |X n , H n ) < Pyn^n(-|H n ) does not hold for some (X™, H n ) e 
c xxn x c jvx* then by definition 



log 



^ > Y n \x n ,H n (g^Y^IX 71 , H r 

Pyn| H „(rfY n |H n ) 



oo 



and, thus, 



Pr 



log 



Pyn |X n ,i? n 

Pyn|^n(,ir n |H™) 



< log7 



(213) 



(214) 



for this (X™, H n ), which does not alter the inequality (12121) . 
For some r real, we choose the coding rate 

^hgM n = C+-L=. (215) 
nK y/nK 

By combining (12071) and (|212|) . taking the superior limit on n, then £ | on the RHS, and the 

infimum over the codes on the LHS, we obtain 



P e (r|/3, c, T) < limlimsupPr 

&o {/3 , c) 

n roo 



\nK B F Y n ]H „{dY n \H n ) J ~ 



(216) 



where we used p n — > 1 as n — — > oo. 

Furthermore, (I2161 l holds for any arbitrary sequence of probability measures {P^n}^L 1 satis- 
fying (11971) . Hence, for any 5 > it holds that 



F e {r\(3,c,T)<G(r\(3,c,T-6). 



(217) 
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Finally, from Proposition [3] (ii) in Appendix IA-A[ we know that the function G(r\(3,c,T) is 
convex in T > so that the RHS of (12171) is continuous with respect to 5 which yields 

P e (r|/3, c, T) < lim G(r\P, c,T - 5) (218) 

<5J,0 

= G(r\P,c,T). (219) 

This concludes the proof. 
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Appendix B 
Proofs of the main random matrix results 

One of the major technical contributions of this article consists in a thorough analysis of the 
asymptotic statistics of I§ n K as defined in (f86l) under different assumptions about the distribution 
P x « of X n . Theorem |3] and Proposition |2] in Section M deal with the case of P X n e V (S n ) 
and are needed in the proof of the lower-bound part of Theorem \T\ in Section ITV-Bll Proposition [2] 
evidences that I^" K as defined in (1851) has a variance which scales as O (l + -^tr (A n ) 2 ). Thus, 
depending on X n G S n , this variance can grow infinitely large or not. Theorem |3] shows that for 
certain distributions Px™ G V (S n ), satisfying 9 n > r\ almost surely for some random quantity 
n and for a constant 77 > 0, the random variable I§" K + ( -^trA n centered around C and scaled 
by ^p^- satisfies a CLT The case of Gaussian inputs X n (for which F x ^ ^ V{S n )) is treated 
independently in Theorem 0] in Appendix IIV-B2I This theorem states that the random variable 
I§ n K satisfies a CLT with asymptotic mean C and variance Q\. This result is needed in the proof 
of the upper-bound part of Theorem [T] and the proof of Theorem |2] in Sections IIV-B2I and IIV-CI 
respectively. 

In order to derive these results, we fundamentally rely on the fact that the random matrices W n 
and H n are Gaussian by assumption. This allows us to use the powerful Gaussian tools developed 
by Pastur, see, e.g., 11361 . 11231 . which are made of two ingredients: an integration by parts formula 
(Lemma [8] and Remark [9] in Appendix iDl) which provides an alternative expression to compute 
the expectation of functionals of Gaussian variables and the Poincare-Nash inequality (Lemma [9] 
and Remark [TO] in Appendix |D]) which is used to bound the variance of such functionals. The 
derivation of the CLTs is based on the characteristic function approach as explained in great 
detail in ll33Tl. E51. 

We need to mention at this point that a proof of Theorem |4] was sketched in ll37l Theo- 
rem 2 (ii)] following a different approach. However, more precise estimators can be deduced 
with the Gaussian method described above. This method allows us to show in particular that the 
characteristic function of :y ^- (ijv.it — C) converges at a rate 0(n~ 1 ), in contrast to o(l) in 
[|37l . This suggests a convergence speed of order O (n" 1 ) of the associated distribution function, 
although we do not prove this result. 

This appendix is structured as follows: In Appendix IB-Al we introduce some additional 
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notations and useful identities which are needed throughout the rest of the article. We then 
successively prove Theorem |3]in Appendix IB -B[ Theorem @]in Appendix IB-Cl and Proposition |2] 
in Appendix IB-Dl 



A. Preliminaries 

For readability, we often drop the index n in matrix notations when there is no confusion, 
e.g., we write H instead of H n . The central object of Theorems |3]and @]is the real quantity 



N,K 



£ log del (In + ~HH" 



trQ(a 2 



HX + aW 



K 



(220) 



HX + aW 



1 



trWW". 



(221) 



We start with the definition of two matrices, the so-called "resolvents" of K l HH H and 
K~ 1 H H H, respectively, which will be of repeated use: 

1 



Q(x) = [ j^HH" + xl 



-i 

vi e C 



NxN 



Q(x)= (^H H H + xI K ^j GC M 



for x > 0. One can easily verify that: 

Q(x)—— = I N -xQ(x), Q(x)—— = I K -xQ(x) 

K K 
We will also rely several times on the following identities: 



Q{x 



Q{x)H = HQ{x 
HH" HH H 



K K 
Q{x)Q{y) = Q{y)Q(x), 



Q(x)H H = H H Q(x) 
Q{x), Q(x)^f = ^Q(x) 
Q(x)Q{y) = Q(y)Q(x). 



Using the above relations, it is easy to prove the following bounds on the spectral norm: 

1 



\\Q(x)\\= Q(x 



Q(x 



HH" 



K 



Q(x 



H"H 



K 



< 
x 

< 1. 



(222) 
(223) 

(224) 

(225) 
(226) 
(227) 

(228) 
(229) 
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With the help of (12241) . we can decompose T n in the following way: 

+ r n>2 + r n , 3 + r n , 4 (230) 

where 

r„ a = logde, (l K + + ^Q^^ (23!) 

r n , 2 = --L=trQ^^WW" (232) 
a HXW H 

r - 3 = trg ^r (233) 

r n>4 = -=trQ -=— (234) 

Vraiv y K 

and where we have defined Q = Q(cx 2 ) to simplify the notations. 

5. Proof of Theorem \3\ in Section \IV-B1\ 
Outline of the proof- 
It is our goal to prove that, under the hypotheses of the theorem, 



>(f) = E 



e™ (235) 



for t G R as n oo, where 



A 



v^K - Co-^tr . (236) 
This will imply, by Levy's continuity theorem [24, Theorem 16.3], that 

r n -/i n ^^ (0>1) (237) 

which is equivalent to the statement of the theorem. The main difficulty arises from the calculation 
of the expectation in (12351 ) which must be taken with respect to the three random matrices W, 
H, and X. Since the direct computation of <f> n (t) is intractable, we calculate its derivative with 
respect to t, leading to a differential equation which must be integrated. In order to further 
simplify the analysis, we split the computation of the expectation in three steps by successively 
considering the conditional expectations with respect to each of the matrices. These expectations 
are developed by the integration by parts formula (Lemma |8] in Appendix iDl) which yields terms 
that are either further developed or shown to be asymptotically negligible by bounding their 
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variance with the help of the Poincare-Nash inequality (Lemma [9] in Appendix IDl). The analysis 
makes use of several auxiliary lemmas which are summarized in Appendix [Ql In more detail, 
the proof consists of the following three main steps: 

1) We first take the expectation over W by fixing X G S n and H G <C NxK ; we define the 



function (j)f n,un {t) 



E 



dt 



X n .H n 



and show that 

X«H«\2 . ;i2, X",H"\ j,X",H", 



(t)+5f' H "(t) (238) 



^(n- 1 ), and £ x "' H "(t) = <3(™~ 2 ) 



for some /£" > H " = O(n), X " H " = O(l). ^n" 3 " 
which must be carefully controlled. This establishes a differential equation for x "' H "(t) 
the solution of which allows us to obtain an estimate of xn ' H "(t) under the form e^*' x ' H ) 
(i.e., with no expectation over W). Note importantly that, although the term /t xn ' H ™ is of 
order O^n^ 1 ) and will not play a role at the end of the calculus, it needs to be maintained 
as the estimation error xn,Hn (t) — e-^' ,x ' H ), which is of the same order of magnitude as 
(t), will increase by a factor n when we take its expectation over H (this is due to 



X» H" 



being of order 0(n)). 



2) We then compute the expectation over H: we introduce the function X " (t) — E [0 X "' H ™ (t)] . 
Working mainly with the tractable estimator e^*' x,H ) of x ™' H "(t) as developed in step 1), 
instead of </> x ' I,Hn (t) itself, we prove in a similar fashion that 

= Or - 1 (erf) tir® + eT(t) (239) 

for some /i x " = 0(n), 6* n = 0(1), and £ x "(t) = (9(n _1 ). This establishes a second 
differential equation. 

3) We finally integrate (12391 ) and show that, if 6> x " > rj > for all n (as per the theorem 
assumption), 



IX" <- 



E 



rr 



e H ™ 



+ C 71 



(240) 



(as n — — > oo). Finally, if 0* > r/ > holds for almost every realization X of a random 
matrix X n with law Px™ G V(S n ) and for all n, we conclude that (|240l) also holds for the 
function <p n (t) = E fe n (t)l = E [e^ (r "" Mri) l which finally proves (12351 We will see 



in the detailed proof of this third step that 6 |X ™ > 77 > needs not be true for arbitrary 

¥ X n G V{S n ). 
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Step 1: 



In a first step, we consider the expectation over W by treating H G C and X 6 5™ fixed. 



We define the function 0* n,H " (t) = E 



which we would like to express as a differential 



equation of the form " Vn Qt {t) = f (X, H, t) 0^" ,H "(t) + £*"' H "(*) for some functional / 
and quantity e^ n ' nn (t) which vanishes asymptotically. Since r^™' H " is real, 0*™ ,Hn (— t) = 
(f)*™ ' n ™ (t)* , so that it is sufficient to consider t > for the rest of the proof. By (|2301 ). 



(jX",H» 



(*) 



X",H" 



dt 



(241) 



k=l 



Since T n { is independent of W , 

E 



v'fl TjTI 

r x»,H» ur^ 

1 n,l e 



X",H n ,X",H r 



r 

1 n,l 



(t). 



The term in T 



E 



X°,H" 

n,2 

.X n ,H' 

n,2 



is studied as follows: 
1 



1 
1 



=E 



AT AT 



EE 

fc=l i=l 
AT AT 

EE 

k=l i=l 



Q 



HH H 

K 

HH H1 



E 



k'i 



X 



^E \w i:j WZ/ tr * 



ki j—i 



(242) 

(243) 
(244) 

(245) 



We now use the integration by parts formula (Lemma [8] in Appendix [B]) to develop the individual 

X n I- 

terms E W^W^e 1 * 1 " ' 



E 



X H 



as follows: 

= S ik E 



+ itE 



AtVi, 



kj dW* 



(246) 



The derivatives 



dW* 



and 



dWi- 



can be computed by straightforward application of the 
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derivation rules provided in Lemma [10] in Appendix |D] 

dr n>1 _ ar nA _ dr n>3 _ or nA 



dW* 
dT 



n.2 



'J 



dW* 



dw; 3 
<9r TO , 3 



VriK 
1 



j ij 



a 



K 



or 



ft. i 



a 



K 
K 



j' 



Using (12301) together with the derivatives (12371 . (I248T ). (12501) in (12461) . we obtain 



E 



Hi „itr„ 



n 



<9r 



X",H n 

n,2 



dW*- 



+ 



<9r 



X",H" ' 

n,3 



aw;?. 



n<W£ (t) - it 



=E 



Q 



HFT 



(7 



/A- 



;A: 



Replacing the last result in (12441) yields 



(247) 
(248) 
(249) 
(250) 
(251) 



(252) 
(253) 



E 



1 n,2 e 



77, „ HH v „ , . 

— trQ ^ x H 

K K 



+ itE 



1 / HH H \ TjrTjrH a HH H H 



h i „wr; 



xiy M e 



(254) 



We will now individually treat the second and third term on the RHS of the last equation. For 
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the second term, using the same steps as above, we arrive at 



E 



nA 



tr Q 



HH H 



K 



WW H e itr " 



N N 



k=l i=l 



17* Q 



K 



HH 



Q 



H\ 2 



K 



HH H 

A 



E 



\WW H ] 



hi e i*r^ 

ik 



(255) 



ki 



it 



[nK) 



■E 



HH 



H\ 3 



tr Q — 77- WW" — crtr ( Q 



HH 



H\ 2 



H 



h l Atr„ 



A" 



tr Q 



HH 



H\ 2 



where 



A 



(*) 



iX",H' 
n 



HH 



H\ 3 



A' 



(256) 
(257) 



-it! 



+ itE 



tr Q 



HH H \ 3 (WW" 



A 



n 



Ijv I e 



(7 



(nA) 



-trQ Q 



HH H \ 2 H 



A 



A 



X^ H e iir " ' 



(258) 



Consider now the third term on the RHS of (1254T) and define T = Q5^Q^X. Then, 

"drr< H 



(7 



:E 



trW H e iir 



X",H" 



N n 
i=l j=l 



AtVi 



dWij 



(259) 



o -HH „HXX H ,Y»m, . 
it^=trQ 2 ^-^Q — 0* (0 +eJ 2 ,H (t) (260) 



A 



nA 



where 



?n,2 (t) = -It- 



nK) 



E 



trQ Q 



HH 



H\ 2 



H 



A / vA 



XVk e 



(261) 



Combining the last results, we arrive at 



E 



r x",H" itr* 

1 n,2 e 



77> _ HH Yn Tin / , 

-trQ— ^ (f) 



+ it— tr Q 



A 



HH 



H\ 2 



A 



+ t 2 



tr Q 



HH H 



A 



+ 



a 



HH H HXX H H H 



VnK 3 



trQ 2 — Q 



nA 



. X",H n /,\ X n ,H n /,\ 



(262) 
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E 



V)l IT" V)l IT" 

We now consider the terms in r 4 ' and r 3 ' . Using similar calculus as above, 

XXn tt77 
,n itl n 

1 n,4 e 



E 



trQWOT 



v 7 ^ 

AT n 



H H 



it- 



er 



'nK L 



it- 



er 



TV n 

EE 

7=1 j=l 

TV n 

EE 

7=1 j=l 



x h 



K 
H H 



Q 



E 



W ijG 



>K 
H H 



31 



or 



X n ,H" 



(263) 
(264) 

(265) 



a 



Vn~K 



n 



Vn~K 



= lt K trQ nK " W " ltE 



fT :tr Q!i^^X H ^Qe lii " 



(266) 
(267) 



nK ^ K 

Doing the same calculus for the second term on the RHS of the last equation, one arrives at 



E 



n I I I I ' < XJH n n 

trQ M^_^x H i^Q e itr " ' 



nK 



K 



it 



HH H „ HXX H H H 



y/nK 3 



trQ 2 -^Q 



nK 



(268) 



where 



_X n ,H n 

-n,4 



(0 



-if- 



(nK) 



trQ Q 



HH 



H\ 2 



if 



H h 



K 



Thus, 



E 



n,4 



. ^ _ 9 HXX^H^ vn Tan , . 9 (7 



(269) 



ttttH UYYHuH 

9 I 1 I 1 J1AA XI -v-n Tin . . . Yn TTn 

trQ'^^Q — <Pn * H (t)-iteM' (*)■ 



if 



e;„„„ rX",H° / r X",H" , ^ 

SlnCe r n,3 = ( r n 4 ) . E 



n,4 



1 n,3 e 



E 



1 n,4 e 



(270) 



Gathering all pieces together as a polynomial in t, we obtain a first differential equation of 



dt 



(i^" Hn - * (e- H ") 2 + i^r' H >r' H "(*) + (27D 
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where, with A = I K - ±XX H , 



X n ,H" 

H'n 



3 X n ,H n \ 2 



n . I 1 HH 
-logdet [In + - 2 ^ 

rH \ 2 



,n HAH H 



,! 1 / HH H \ " 2a- „,HXX"H H 



K 



nK 



X n ,H n 



VnA 3 

ef l ' Hn (t) = it 2 E 



1 / HH H 



3d 2 2 HH H HXX H H H 
+ trQ 2 — — Q- 



1 / n HH 
tr Q 



y/nK* ' " # 
H ^ 3 /PW H 



A" 



\ n 



LiV 



nA 

3(7 



trQ Q 



(272) 
(273) 
(274) 

HH H \ 2 B.XW" 



a 



ztrQ Q 



HH H \VX H H H 



JtT 



Kn 
(275) 



^nJp" " \" K J v K„ 
Although (12711) is already sufficient to continue this proof, we will make a further refinement 
of the term e^ n ' nn (t) which will be required in the proof of Theorem |4] To this end, we develop 
^n"' H "CO w i m me help of the integration by parts formula (Lemma |8] in Appendix ID)) similar 
to the calculus above. This leads after some straightforward computations to 



-X",H' ; 



(0 = *x 



3„X",H" iX",^ 



+ < (t) 



(276) 



where 



v 



X",H" 



1 / HH 
tr Q 



nK 2 



h\ 4 



£ * ' (£) = t d E 



A" 



tr Q 



4a 2 
nA 2 



trQ 2 Q 



HH H \ 2 HXX H H H 



HH H \ /WW H 



A 



A 



nA 



(277) 



4(7 HH H \ 3 HXW H 
trQ Q 



trQ Q 



HH H \ 3 WX H H H 



nA 2 



A 



"A 2 ' v V " A' / VAn 
Combining (|271l) and (12761) . we finally arrive at 



at 



An 
(278) 



(279) 



Let us now have a closer look at the quantities 9 



» ^ 



, and e 



X n ,H r 



(0 



individually. Using the identities and bounds presented at the beginning of this proof, one can 
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verify that 



0< (ar ,Hn ) 2 <- + — trXX H 
~ K n > ~ K nK 



< KT' nn < 



N 



VnK 3 \fn 3 K 3 



trXX H 



< ^" H " <^L + -^trXX H . 



nK 2 



n- 



> K 2 



Based on Remark [8] in Appendix iDl we can bound the absolute value of e^ n ' Hn (t) as 
kf' H "(0l < * 3 




n / HH H \ HXH /H 

trQ Q 



nK 2 



K 



Kn 



By Lemma [4j (ii) in Appendix IC-Al it follows that 



Var 



nK 2 



tr Q 



HH H \ fWW H 



K 



n 



< 



Var 
2 



1 / HH H \ WW" 
tr Q 



nK 2 



< 



n 3 K A 
2N 



tr Q 



HH 



K 

H\ § 



7\ 



3^4- 



Similarly, by Lemma 0] (i) in Appendix IC-A[ it follows that 



Var 



a 



nK 2 



trQ Q 



HH H \ 3 HXiy H 



K 



Kn 



i HH 
tr(Q^ 



n 4 fsT 4 



H ^ n HXX H H H n 

T^" 



if 



< 
< 



a 



,H H H. 



n 4 ^ 4 
1 



trQ 



n 4 K 4 

Replacing (12861) and (12891) in (12831) . we then obtain 

t 3 



K 
trXX H . 



-XX h 



_X n ,H r 



\/n 3 K 



1 



2N 

~K +5 \ ii h 



trXX H . 



(280) 
(281) 

(282) 



(283) 



(284) 

(285) 
(286) 

(287) 

(288) 
(289) 

(290) 
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Since -^trXX H < 1, it follows from (1275b . (12801 ). (128TI (12821) . and (12901 that 



(e n ' H ") =0(1) 


(291) 




(292) 




(293) 


£ f- Hn (t)=0(t 3 n- 3 ) 


(294) 


eT' H "(0 = ^ (i 3 n- 2 ) 


(295) 



where the notation e n (t) = (D(t a n ") means that there exists C > independent of t and n 
such that, for all t > and n G IN, |e n (t)| < Ct a n~ p . 

Two remarks are important at this point. First observe that the successive introduction of 
k x ™' h ™, i/ x "' H ™, etc., allows one to gain at each step one order of precision on the estimation of 
^n™' 11 (through refinements of the coefficients of its differential equation). The choice of the 
order to be used is mainly ruled by the subsequent averaging steps. For the present proof, we 
need the error (given by £*™ ,Hn (t)) to be within 0(n~ 2 ). 

Second, it is very important to keep the terms in t in the various bounds derived here and 
below. The reason for this is twofold: (i) to solve the differential equations in 0^ n,H ", then 0*™, 
it will be necessary to integrate these bounds and their integrability must be controlled, (ii) at 
the end of the calculus, the normalization of T n by (the estimate for) its standard deviation 0* n , 
used to ensure a limiting unit variance, will be performed via a change of variable t n- t/0* n 
which requires a close inspection of the polynomials in t and n~ l in the bounds. 



Step 2: 

In this step, we first solve (12711) to express 0*"' H "(t) as a function of X and H. We then 
proceed similar to Step 1 and express the function = E{(f)^ n ' H ™ (t)} as the solution of a 

differential equation. 

The solution of (12711) reads 

(296) 
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Define the following function 0* n (t) = E[(/)* n,Hn (t)]. We then have from (12961) 



E 
E 
+ E 



(297) 



• X n ,.ff n + fnX. n ,H n \ 2 | ;+2 X^ff^A K/i 



• X n ,ff n . //iX n ,_H" n \2 .,2, X n ,H n 



v-n ttTi 2 / -v n T-f n \ 2 + 3 y"- w 7 "' 



. x",n" , , J /„X",fl"\ J .^3 X°.H" 

-lXfl n +^-[9n J "lyKn -X",H 



xe 



itfi n 



3/ J doc 
(298) 



We will now show that only the first term on the RHS of (12981 ) is asymptotically non-negligible. 
For this, consider the second term on the RHS of (12981 ) (without the expectation): 



X 



< 



vTL ttTI 2 f V >l tt71\2 3 Y n 



., X n ,ff™ t 2 /,,X n ,H™\ 2 . .t 3 X",H° 



- t (6^ J + It K r 



X n ,H n 



< (\»T> Hn \ + tO(l) + t 2 (n- 1 )) O (t*n~ 2 ) 

< \t£ n > Hn | O {t 4 n- 2 ) + O {t 5 n~ 2 + t 6 n- 3 ) . 
Moreover, by Jensen's inequality, 



x) | dx (299) 

(300) 
(301) 



logdet ( I 



1 HH" 

^ K 



+ V^K + J—N 



K \ a- 

= 0{n). 

Combining the last results, we have shown that 



< x j^-N\o^(l + —A+^K+J^-N 



(302) 

(303) 
(304) 



E 



. "Y"?i Tin 



t(6l n > Hn ) 2 + \t 2 K* n > Hn ) 



X 



-tag + ^(0 X ) -1%-*? ' =XP,H n 



X",H" i 2 /„X n ,H n \ 2 . .t 3 X. n ,H n 



^ n -t + t 5 n -2y 



(305) 
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Since all bounds are clearly integrable over t, this now means that e 
is an estimator of 0*" within Note that this bound would be 0(1) if we had only used 

an estimation of 0^' ' Hn within O^n^ 1 ) in the previous step. Hence the fundamental importance 
of the term k„ ' a . 



We can therefore proceed to study 0„ via the estimator e 
Starting back from (12981 ), we first verify that 



itfi n 



E 



X n FT"" 



c n „n\2 3 



f 3 X' l ,H" 



0{t 2 n 



Thus, we have 



dt 



=E 



(306) 



+ G((t 2 + t 4 )n- 1 +t 5 n- 2 ) 
(307) 



We now develop the term in the expectation and express it under the form of / (X) <f% n {t) + 
£^ n (t) for some functional / and asymptotically negligible quantity e^" (t) . For better readability, 
we define the shorthand notations 

t 2 i t 3 



it/i 



(308) 



and consider individually the terms 



A: E 



B: E 



lX n ,H n \ 2 „7„ ' 



Term A: The term A cannot be evaluated in a straightforward manner as the integration by 
parts formula (Lemma [8] in Appendix ILTb cannot be applied to the log-term in (as defined 

in (12721) ). To avert this difficulty, we use the identity 



logdet ( l N + — 



00 1 HH^ 
—XxQyu) — T^du 



a 2 K J J a 2 u K 
which, together with the Fubini theorem (using tr Q(u)HH H < u~ l tx HH H ), gives for A: 



(309) 



X. n ,H n 7 £ 
H'n c 



K j„2 u 



-E 



tiQ(u 



K 



du — 



txQ- 



HAH V 



K 



(310) 



Before we continue, we need the following result which is the cornerstone of the subsequent 
analysis: 
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Proposition 4: Let u > a 2 > and 7*"'^™ be defined as in (1308b . Then, 



(i) E 



trQ(w 



HH H x«,h" 



l-c + 2u5 (u)-^ (n) 2 ) 
1-c + u(1 + 25 (m)) 



E 



e 7 " 



n 5 Q (u) - o- 2 (?i(m) - g 2 7 2 (u) ^tr A t 
l-c + «(l+2«J (tt)) 



(z'z) E 



trQ 



HAH" 

W 



-e ln 



[»+(K5 (a*)-£) l0 ( ( r 2 )} ^t rA 
1 + cWa 2 ) 



E 



+ 



Ar^trA[ 7o (a 2 ) 


-1] 


1 


-0 + 2^(0-2) -^ (a 2 )2" 


o 2 [1 + 5 


W 2 )] 


1-0 + 0-2(1 + 250^))] 



g7n 



. f ^ i w ' £tr A [to (a 2 ) - 1] [fr (a 2 ) - <r 2 fr (a 2 ) - ^ fr 2 ) £tr A] e 



[l + 5 (a 2 )] [l-C + t7 2 (l+25o((7 2 ))] 



n' 7i(a 2 )(^trA-^trA 2 ) 

+ 1 ' WTT ttm^ 



(itr A) z [ 72 (a 2 ) (1 - 7o (a 2 )) - 7l (a 2 ) 2 ] + 5 (a 2 ) 7l (a 2 ) ±tr A 

K 



-E 



Px(t) | tP 2 jt) 1 ^ A2 



for some non-zero polynomials P(t), Pi(t), Pz(t) in t with nonnegative coefficients and with 
6 m (x) and 7m (x) given by Proposition [7] in Appendix O 

Proof: The proof is provided in Appendix IC-CI ■ 
Applying Proposition @] (i) and (ii) to the first and second terms of (13 10b . respectively, we 
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obtain 



E 



V 7 ^ 




)C (i-c + 2m5o(m)-^5 (m) 2 
m(1-c + 7x(1 + 25 (u))) 

[^+(5o(a 2 )-^)7o(<x 2 )] 1 



l + 5 (^ 2 
[7o (^) - 1] 



du 



tr A 



1 -c + 2a 2 5 ((x 2 ) 



2N2 



tr A > E 



n 
K 



a 2 [l + 5 ((7 2 )][l-c + ( 7 2 (l + 25o( ( 7 2 ))] K 

°° 5 (u) - o 2 5i{u) - a 2 7 2 (u) ^tr A 
2 l-c + u(l + 2<y (u)) 

^tr A [ 7o (a 2 ) - 1] [J (a 2 ) - a 2 ^ (a 2 ) - tx 2 7 2 (^ 2 ) £tr A] 
[l + 5 (a 2 )][l-c + a 2 (l + 25 (a 2 ))] 



3 7n 



7l (^(itrA-^trA 2 ) 



l + 5o(^ 2 

,2 



(^trA) z 72 (^ 2 ) (l-7o (a 2 ))- 7i (^ 2 ) 2 + 5 (a 2 ) 7 i (a 2 ) £tr A 



:i+^ 2 )r 



E 



+ 



Pi(t) , tP 2 (t) 1 



+ 



trA z 



(311) 



'K yK K 

where for the last RHS term, we used j^u~ 2 du < oo and Pi, P2 are non-zero polynomials 
with nonnegative coefficients, possibly different from those of Proposition HI Note in passing the 
fundamental importance of maintaining l/u in the big-O term of Proposition |4] (z). The existence 
of the two integrals in (131 II) can be proved via bounds on the 8 t (u) and 7t(w), essentially relying 
on their definitions (|6TT )-(l64l) and on controls similar to Property [TJ (i) and (ii) in Appendix |D] 
Nonetheless, a more immediate argument consists in remarking that, since the LHS of (131 II) is 
finite, and so are all terms aside from the integrals on the RHS, so is the sum of the integrals. 
Taking t — then justifies with the same argument that the first integral is finite which, taking 
then t 7^ 0, ensures the finiteness of the second integral. 

Also note that the last RHS term of (13111) is not necessarily negligible in the large n limit. 
Indeed, for X e S n , tr A 2 can grow as 0(K 2 ), so that the whole term may grow as 0[\fK). 
It is therefore essential to keep track of the terms in A. The pre-factor t in front of -^tr A 2 will 
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play a significant role in controlling these terms at the end of the proof, which explains why we 
also need to keep track of t in the various bounds. 
Term B: For the term B, we have 

2 .X",fl"" 



E 



(e* n > Hn ) e< 



E 



E 



E 



1 

K 



tr Q 



HH" 



K 



2a 2 / ifXX H #" 
+ — tr Q Q 



K 



nK 



1 n HH" a 2 2 HH H 2<r 2 2 H (I K - i-XX") H H 



K UQ K 



+ K UQ2 K 



K 



K 



a 4 „ 2 2o 2 „ 2 HAH H 



■trQ< 



K 



(312) 

(313) 
(314) 



To proceed with this term, which is essentially equal to the product of the expectations of the 
two arguments, we rely on Remark |8] in Appendix ITJ1 Using Proposition [6] in Appendix IC-AI and 
Proposition [7J in Appendix IC-AI to bound the variances of each term, we have 



E 



X n ,J/" 



c - a%(a 2 ) - 2a 2 7l (a 2 ) j-tr A j E 



+ o 



(315) 



where we used in particular y/K~ 3 ti A 2 < 1/ \[K. 
Combining (13071) . (13111) . and (13151) we finally obtain 



dt 



■ "V" n 



(316) 
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where 




c (l - c + 2u5 (u) - ^5 (u) 
u(1-c + u(1 + 26q(u))) 



l + Ma*) K' 



du 



: tr A 



ho (a 2 )-!} l-c + 2a%(a 2 )-^5 (a 2 ) 2 



tr A 



a 2 [l + <5 ( (T 2)][l- c + ( T 2 (l + 25o(cr 2 ))] R 

5 (u) - a 2 5i(v) - o- 2 72 (u) j^tr A 
1 -c + u(l + 25 (u)) 
^tr A [ 7o (a 2 ) - 1] [Sp (a 2 ) - <r 2 fr (a 2 ) - a 2 l2 (a 2 ) jtr A] 

[l + 5o(^ 2 )][l-c + a 2 (l + 25 ( ( T 2 ))] 
7l (a 2 ) (^trA-^trA 2 ) 



(317) 



l + 6o(a 2 ) 



(itrA) 14 72 (^ 2 )(l-7o(^))-7i(^ 2 ) 2 + <$o (c 2 ) 7i (°" 2 ) 7? tr A 



(c - a%(a 2 ) - 2a% (a 2 ) 1* a) } 
and the quantity e*" (t) satisfies 



P 1 (t) , tP 2 (t) 1 a2 



+ 



(318) 



(319) 



4K \[K K 

for some non-zero polynomials with nonnegative coefficients P\(t) and P2 Ca- 
using Lemma QT] in Appendix |D] and the relations in (l6T|)-(|69l, the expressions of and 
9*™ can be simplified as: 



6*1 + Ci ~trA +C 217 trA 



K 



K 



(320) 
(321) 



where 0_, (o(z), (i(z), and ( 2 are defined in (|26l ). (|65T ). (|69l) . and (|68l ). respectively. 



with the help of the previously 



We now relate 0* n (t) = E [^ n,H "(t)] and E 
established results. Starting from (|2961 l. one can show by the same arguments as in (13011) that 



< Mt n 



4„-2 



(322) 
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for some constant M independent of H, t, and n, from which 



(t)=E[^(t)]=E 



O (t 4 n~ 2 ) 



or equivalently 



E 



(t) + (t 4 n- 2 ) . 



(323) 



(324) 



Replacing the last equation in (13161) leads to 



at 



Or - * (o 2 ) + (>r - 1 (o a ) ^ (* 4 -- 2 ) + c (*)■ (325) 



One can verify from (13171 ) and (13181 ) that 

lC = 0{n) 



Hence 



at 



where e^ n (t) satisfies 



Step 3/ 



-x», + 1 , 1 . trA 2 



< (*) = o 



+ 



(326) 
(327) 

(328) 

(329) 



Solving the differential equation (|328t . we arrive at 

0r (t) = <&?-$m* (i + J\-^ n ^ n )\T{x)dx^ 



(330) 
(331) 



with e^(t) = 

Take now a sequence of matrices X^X 2 , . . . such that 9^" > r\ > for all n and denote 



e " 



. Then, from (13301 



i * f r x"_ x™^ 



t 



n l ax" 



e e " 



£ \ -it^TT 

e «n 



(332) 
(333) 
(334) 
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To conclude, we need to control the term This is where the precision on 

ef n (t) from (13291) is used. Take t > fixed. First observe, by the definition of (6»^™) 2 in (13181) 
and by the positivity assumption that 

1 



:trA 2 -5 



(335) 



for some constant B > obtained via the bound < itr A < 1, where we recall that 71 (a 2 ) > 
(see (O). 

Discarding the fixed term t from the bounds of e^ n , we have 



-X" 

n 



f)X n n \ ax n 

n \ n 



o 



(336) 
(337) 
(338) 



where, in the last equality, we used P 1 {0% n )~ l < Pi(^ _1 )?7 -1 , P 2 (^f)" 1 ) < 

P 2 (tr]~ 1 ), both bounded for t fixed, and, from (1335b . 



krA 2 



K 



< 



^trA 2 



^^^trA 2 -5 



max yj, (1+5()((j2))2 K 
(l + 5o(^ 2 )) 2 ((TOTF^ trA2 - 5 



5 



7i(^ 2 ) 



(339) 



(340) 



(341) 

7i(ct*J v V 

Note here the crucial importance of having a positive lower bound on 0*" to avoid a possible 
divergence of the bounds in 1 / . 
We conclude that 



0X» 



o 1 4= 



(342) 



Take now <E P (<S n ) such that 0% n > rj > 0, almost surely, for all n and let n (t) = E (t) 
Then, from ([334]) and (13421 . 



(343) 
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Taking t < 0, and using <p n {—t) = <p n {t)*, the result above generalizes to t G R. 
This implies by Levy's continuity theorem that 

En-pi^ M (0,1) (344) 

where we have defined // n = /i* n and 6> n = 9* n ■ This terminates the proof. 

C. Proof of Theorem |?] in Section \IV-B2\ 
Outline of the proof: 

The proof of this theorem follows closely the proof of Theorem [3] The major difference is 
that the entries of X n are now i.i.d. standard complex Gaussian. This implies in particular that 

1 H 

the peak energy constraint -^trX n (X n ) < 1 in (fill) does not hold and must be changed into 
moment bounds. The proof is split into two steps: 

1) Our starting point is the differential equation (|2791 l in ,H " (t) which we integrate to ob- 
tain an estimation of (p^ n ' H " (t) which depends only on X 71 and H n . In contrast to the proof 
of Theorem |3j we then define 0^™(t) = E [0^"' H ™(£)] and take the conditional expectation 
over X n first rather than H n . This is necessary because no bounds on ^trX n (X n ) H are 
available. By averaging over X n first, this problem can be circumvented. We then proceed 
to standard Gaussian calculus to obtain a second differential equation of the form 

= Of - t (6TY + it 2 ^) € n {*) + eT(t) (345) 
for some /i^™, 9^", «^™, and e^"(t) whose asymptotic scaling properties are analyzed. 
We then integrate this differential equation which yields an estimation for 0^" (t) as a 
deterministic function of H n . 

2) The last step consists in the calculation of the expectation over the channel matrix H n . 
We introduce the function 0„(t) = E [0^™(i)] and show similarly to step 1) that 

(j) n (t) = jt^KC-iel + q (p(t) n -i) (346) 

for some polynomial P(t) with positive coefficients. We then define 4> n (t) = E e" e + ^ N < K ' 
and demonstrate that 

+2 



<j> n (t) -> e-T (347) 

» which togethe 

terminates the proof. 



for t G R as n > oo which together with Levy's continuity theorem [1241 Theorem 26.3] 
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Step 1: 



Similar to the proof of Theorem [3] it is sufficient to assume t > 0. We begin with the 
differential equation (12791 ) which is still valid as no bound on X n has been used in its derivation. 
In contrast to the proof of Theorem |3] we cannot use (|271l ) since the term e^"' nn (t) is of order 
0{jT 2 ) whereas £* n ' H "(t) in (12791) is of order 0{n- 3 ) (see (12941) . (T295V ). In the course of the 
proof, we will integrate a differential equation after successively taking expectations over X n 
and H n . Since each integration multiplies the residual error terms by a factor of order 0(n), 
we need to work with the refined version (12791 ) of (|271t to obtain a final approximation error 
of 0{n~ l ). 

We integrate (12791 ) over t to obtain the following result: 



iX".H" 



(0 



>\H n \ 2 . t 3 x n ,H n t 4 X n ,H" 



x ( 1 + J e 



r" H n H n\2 .„3 x n ,H n ~ 4 X™ H™ 



_X n ,H" 



[x)dx 



(348) 



Next, we consider the conditional expectation over X n . Denote (fi^it) = E [0^"' H "(t)] . From 
(I348T) . 



at 



E 



9* 



(349) 



E 

+ E 



X n ,H n t 2 fnX n ,H n \ 2 -* 3 H™ *4 r n H n" 



j-l^n ) +1 — K» +-f n 



x / e 
'o 



-ia;/i n +^-(6»„ 



2 3 ^fi Tin 4 tt n 



vC J doc 



+ E[er ,Hn (*)]- (350) 
We first focus on the last term on the RHS of (13501) . By (12901 ) and Jensen's inequality, it 
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follows that 



\E[er^(t)]\<E[\^(t)\] 

t 3 ( I2N 



< 



t 3 



:trXX H 



nK 



2N 



+ 5WE 



nK 



: trXX H 



y/n^K 3 ^ V K 
= O (t 3 n- 3 ) . 

For the second term on the RHS of (13501) (without the expectation), we obtain: 



X n ,H n t 2 f X n ,H n \ 2 ..t 3 X",H™t 4 I",H" 



itfj,„ 



x / e 
'o 



(351) 
(352) 

(353) 
(354) 

(355) 



< e 



i 2 / fl X'\H™\ 2 . t 4 , 



£ 2 / yTl T-T" \ 2 4 



(a;)cix 



e? n,H "fx)| dx. 



(356) 



As opposed to the proof of Theorem [3] the introduction of the term — x^"' 11 " i n t ne integral 



of the RHS of (13561) incurs a technical difficulty which must be overcome to bound the RHS of 
(13561) as in (12991) . To solve this difficulty, note that 

1 



,X n ,H" 



nK 2 



tr Q 



HH H \ 4 4a 2 „ / HH H \ 2 HXX H H H 



if 



< 



:tr Q 



HH H \ 2 2a 2 



ifn \K V # 

(C' Hn ) : 



„HXX H H H 

+ trQ 2 

K ^ 



2 //)X n ,H n ^ 2 

ifn 



where the inequality is due to 



Q 



HH h 



A' 



(357) 
(358) 
(359) 

2 



< 1. Consider now the polynomial f n (x) = \ (^ n ' H ™) — 



x^ u X n ,H n 
4 u n 



which achieves its maximum point in x* 



G n ) jVn and is monotonically 

G9.c). 



increasing on the interval [0, x*]. By (13591) , x* — > oo as n — oo so that max xe [ 0jt ] f n {x) = f n (t) 
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for all large n. Thus, for n large, 



Jo 

t 4 



2 n 4 vTL t_t 71 

g 2 P V7,n 4 n 



J doc 



< 



< 



Mi 4 



2Ar 5,/J.tr.Y.Y" 



n" 



( 1 + — trXX H ) 
V nK J 



(360) 
(361) 
(362) 
(363) 



for some constant M, independent of n, K, where we have used the fact that *Jx < \ + x. Thus, 



E 



• x n ,n n 

1 H'n 



-j-lfn ) +1 — «n 



x / e 
'o 

Mt 4 



n pin 2 / -v^n til \ 2 3 xj n 4 v n tr n 



-IX jLt n 



Jf»,H" 



By (12801) . (I28TT) . and (12821) . we have the following bounds 

(^ n . H ") 2 <M'fl + ^trXX H 



< 



M" 



n 



M' 

x ,h < 

n — 2 



/// 



1 

i 



trXX H 



(364) 

(365) 
(366) 
(367) 



n* \ nK 

for some constants M', M", M'", independent of n and H™. Moreover, by Lemma @] (ii) in 
Appendix IC-Al 



E 



1 + tvXX" 

nK 



E 



<4 + 



nK 



:trXX H 



+ Var 



nK 



:trXX H 



nK 
O(l). 



(368) 

(369) 
(370) 
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In addition, 

E 



«r H "l(l + ^.rA'.Y" 



< E 



n , , f T 1 HH H 

-logdet 1*+- — 



n _ HH H pa HXX H H H \ / 1 H 



(371) 



logdet ( I 



*-N 



1 HH 1 

^~K 



+ trQ 



HH h 

K 



E 



fri HXI H / 1 vvH \ 

J— trQ 1 + trXX H 

\ K ^ nK \ nK J 



(372) 



, n / , / 1 HH 

<2W — (logdet Ijv ' 



(7" 



+ 2trQ- 



HH h 



A" 



'Var 



n HXX H H H 

nK 




Var 



2W^ logdet I. + - 2 — 



HH H 

+ 2trQ-^- ) + C)(I). 



nK 
(373) 

(374) 



: trXX H 



Using (13651) . (13661) . (I367l> . (1575b . and (13741) in (13641) . we can conclude that 



E 



X n ,H n t 2 ( Q X n ,H n \ 2 . . X",H" . i 4 Jf n ,H" 



2 3 yn u-n 4 yn tt n 



x / e 
'o 

< ( I +2 A /-f logdet + 



2trQ- 



HH H \ \ M""t 4 



K II iv 

for some constant M"", independent of n and H. 

Equation (13501 ) together with (13761 ) and (13541 ) implies that 



(375) 
(376) 



E 



, . v-n H ri ,2 / yn it?) \ 2 3 v n rrn ,4 v-ti \jn 

(Vf > H " - t (C H ") 2 + it^f ' H " + t 3 ^" H ") e*** ) ' 



where satisfies 



eT(t)\ < 



{t 3 + i 4 )M'"" 
n 

rum 



1 + 2 A /-^ ^logdet yi N ^-— 



1 HH H \ ^HH H 

+ 2trQ— 



(377) 



(378) 



for some constant M , independent of n and H. 



May 21, 2013 



DRAFT 



75 



We now develop the main term in (13771 ) to obtain a differential equation for <j)^ n (t). In order 
to simply the following notations, we introduce the quantity 

v / „ Y~ n it/I \ Z . vn T~I " v 



X n ,H" 



Using the differentiation rules in Lemma [TO] in Appendix |D] one can show that 



(379) 



dX 

dX*, 



it 



H H QHX 



t 2 a 2 TH H Q 2 HX 



3^-2 



+ 



it 6 a 



H H Q 2 HH H QHX 



4^-2 



+ 



H H Q 2 (QHH H ) 2 HX 



(380) 



H . 

- trQ— — 



E[e^] + A /-E 



tr — — e A 



Consider now the first term in (I3771 i (without the leading i): 

H QH7 vH 
(381) 

We can develop the last term in this equation with the help of Lemma [5] in Appendix IC-AI and 
(13801) as follows: 



n ^ 
— E 
K 



tr 



H H QHXX H A 

nK 6 



-^trQ^E [e A ] +E 



-y 



i,3 



X H H H QH 



(382) 



^trQ^E [e A ]+i*E 



1 

— tr 

K 



H H QH\ 2 XX H 
• 

x y n 



t 2 E 



a 2 H H QHH H Q 2 HXX H 



-.tr 



nK 2 



(383) 

+ 0(t 3 n- 2 + t A n- 3 ) 
(384) 



where the last equation follows from the fact that |e A | < 1 for n sufficiently large (see (13591 )) 
and the terms in t 3 and t 4 are consequently of order O (n~ 2 ) and O (n~ 3 ), respectively. 
Applying the same steps to the second term in the last equation, we arrive at 



HE 



1 / H H QH\ 2 XX H 



— tr 

K 



it— M ( Q 



K 



HH h 



n 



K 



E [e A ] - t 2 E 



V nK 3 



H H QH\ 3 XX H 



K 



n 



+ O (t 3 n- 2 + t 4 n- 3 ) . 

(385) 
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Since I e A I < 1 for n large and by Lemma |4] [ii) in Appendix IC-AI 



Var 



-AM 



H H QH\ 3 XX H 
W~ ) n 



< 



n 2 R 3 



tr 



h h qh V 

K ) 



O (n- 4 ) 



(386) 



it follows from Lemma [7J in Appendix O that 



E 



1 



v^a 3 " 

Similarly, 



tr 



H H QH\ 3 XX H 



n 



tr (V^y E [e A ]+0(n- 2 ). (387) 



Var 



a 2 H H QHH H Q 2 HXX H 
tr 



a/ riA 3 



n 



K 2 



< 



2a 4 



rv 



l K 3 



trQ 2 Q 



HH 



A" 



(388) 



by Lemma |4] (ii) in Appendix IC-A[ so that by Lemma [7J in Appendix O 



E 



a 2 H H QHH H Q 2 HXX H 
tr e 



V^A 3 "" nK 2 
Combining the last results yields 



a- 



VnA 3 " 



trQ Q 



HH h 



K 



E [e A ] + O (n- 2 ) . (389) 



IT* ir- 

— E 
A 



tr 



H H QHXX H A 

ttA 6 



fn _ HH H r A1 1 / HH H \ 2 r X1 
^-trQ— E [e A ] + it-tr (Q— J E M 



VnA 3 



tr Q 



HH 



H\ 3 



A' 



(7 



trQ Q 



HH H 

~A~ 



E[e A ] 

E[e A ] +0((t 2 + t 3 )n- 2 + t A n- :i ). 

(390) 



The last expression can be further simplified since tr ^Q^- j = tr ^Q^- j — <x 2 tr Q ^Q^S- j • 
Using this result and gathering all terms together, we have finally proved that 

E[i^H V] ^iyflogdet^ + l^E^] -titr (q^)W] 



it 2 



tr Q 



HH H 



A 



E [e A ] . 



Consider now the second term (13771 ) (without the leading —t): 



E 



X",H"\ 2 A 



(e,f' H ) 



HH 



H\ 2 



A 



E [e A ] + E 



2a 2 H H Q 2 H XX H A 
tr e A 



A 



A 



77 



(391) 



(392) 
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For n sufficiently large, we can develop the second term in the last equation with the help of 
Lemma [5] in Appendix IC-AI as follows: 



E 



2a 2 H H Q 2 H XX H A 
tr rr e 



K 



K 



n 



2^ 
K 



trQ 



,HH h 

K 



■E [e A ] 



+ itE 



2<x 2 H H QHH H Q 2 H XX H 



y/nK 3 



tr 



K 2 



n 



Since 



Var 



2a 2 H H QHH H Q 2 H XX H 
=tr — 



O (n- A ) 



+ O (t 3 n- 2 ) . 

(393) 

(394) 



.VnK 3 "' K 2 n 

by Lemma |4] (ii) in Appendix IC-A[ it follows from Lemma [7] in Appendix O that, for all n 

large, 



E 



2a 2 



H H QHH H Q 2 H XX H A 

tr — e 



Thus, we have shown that 



n 



2a 2 



ynK 3 



trQ 



QHH 



H\ 2 



E [e A ] + O (t 3 n- 2 ) . (395) 



E 



-t(em 2 e X }=-t^tr[Q 



HH H 



E[e A ]-t^trQ 2 ^E [e A ] 



it 2 



2a 2 



tr Q (^^) 2 E [e A ] + O (t 3 n" 2 ) . (396) 



The third term of (|377t can be treated in exactly the same way, leading to 



E [it 2 ^"- Hn e A ] 



if 2 



y/nK 3 



tr Q 



(397) 



For the fourth term of (]377) . it follows directly from (|367) and |e A | < 1 (for n sufficiently) 
large, that 

E [t 3 .f' H "e A ] = O (t 3 n- 2 ) . (398) 
Combining (13771) . (139TT) . (13961) . (13971) . and (I398T) . we finally arrive at 



(v*r - * (O 2 ) E [e A ] + eT(t) + O ((t 2 + t 3 + S)n- 2 ) (399) 



where 



logdet ( I 



ri n 



m 2 = f * (q^ 



L/V 



1 HH 1 

^ 2_ ^" 



2(T 2 ^,HH H 2 ^HH 

H trQ 2 = —trQ 

K ^ K K ^ K 



(400) 
(401) 
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and where we observe in particular the absence of a k^" term premultiplying E[e A ] as in the 
previous steps. 

Next, we establish a relation between and E [e A ] in order to transform (1399t into a 

differential equation for Taking the expectation on both sides of (13481) . it follows from 

(13561) and (13631) that 

(j% n {t) = E[e A ] +0(t 4 n' 3 ). (402) 
By (12291) . (9f n f = 0(1) and = O (n" 1 ) which, together with (1599b and (1378b . yields 



9i 



where e„ satisfies 



1 HH H 



^ I ^ " ( 1 + 2 \Ik i l0gd6t i 1 " 1 a* K 



2trQ 



HH H 



(403) 



(404) 



n" \ V /v \ \ <r- /v / if 

for some constant M, independent of n and H. Integrating this differential equation in t leads 



to 



1 + 



(405) 



Step 2: 



We now proceed with the final integration of (14051 ) over H n . In order to simplify the notations, 
we define 

,9 

, 2 



(406) 



where we recall that fi^ n , 9^ n , and k^" denote the quantities /i^", 0^*™, and k^™ as defined in 
(14001) — (140 1 1) with H" replaced by the random variable H n . We further define <p n (t) = E . 
Then, by (1355b . 



<9t 



= E [(Vf - t (C) 2 ) e- (l + J* e--»"^K") 2 £ f (x)rfl j + eT(t) 
From (14041 ). Jensen's inequality, and (12291 ), 

E [| £ -r(*)|] < ^ (l + 2.V + 2,/^logde, fl„ + i E r ™" 



(407) 



A' 



(P(t)n- 2 ) 

(408) 
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where P(t) denotes a generic polynomial in t with positive coefficients. In what follows, the 
same P(t) will be used to denote possibly different such polynomials. Similarly, 



H n :x 2 (nH n ^ 2 



n > (x)dx 



< E [iiT] E 
= C {P(t)n- 1 ) 



^ ditjC 



O {P{t)n~ 2 ) 



(409) 

(410) 
(411) 



where the last inequality is due to Lemma [7J in Appendix O the bound on the variance of the 
term logdet (l N + jz^jf-^ provided in (1453b . and the observation that (O^) 2 = 0(1) and 
K u n = Q {n~ l ). Combining (14071) . (14081) and (|411|) . we conclude that 



at 



E 
E 



HO 2 



+ (P(t)n- 1 ) 
+ (P(t)n- 1 ) 



(412) 
(413) 



where the last line is due to it 2 K^"e x = O (P(t)n -1 ). 

We will now develop the RHS of (14131) . Similar to the proof of Theorem |3j the term E [i//^™e x ] 
can be expressed as 



E KV] = E 



K logdet \ls + -^- ir 



E 



OO -y H H^ 

—txQ(u) — ^^due x 



2 u 



-E 



\xQ{u 



K 
HH 



H 



K 



du. 



K J a 2 u 

The analysis of the expectation in (14161) is deferred to the proof of the following result: 
Proposition 5: Let u > a 2 > and x as defined in (14061) . Then, 

\ -c + 2ud (u) - ^5o(m) 2 ) 



(414) 
(415) 
(416) 



E 



tr Q(u 



K 



N 



l-c + u(l + 2<y (u)) 
n Sq(u) — a 2 5i{u) 



-E \e x ] 



E \e x ] + O 

K l-c + u(l + 2<J (w)) V"" 



lb- 



Proof: The proof is provided in Appendix IC-BI 
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Using Proposition [5] in (14161 ) yields 

l-c + 2u5 (u)-^8 (u) 2 



E [/if e x ] = V^K 



. n 

+ lt K 



u{l - c + u{l + 25 {u))) 

5 (u) - (7 2 (5i(m) 



du 



-duE \e* 



(417) 
(418) 



T 2 1- c + u(l + 25 (u)) 
where the term 0{n~ l ) is obtained from -^du < oo. Note here that both integrals are well 
defined by the finiteness of the LHS and of which implies the well definiteness of the 

individual integrals when taking t — 0, then t ^ 0. These integrals are evaluated via Lemma [TT] 
in Appendix O from which 



00 1 -c + 2u5 (u) - ^5 (uf , , 9xx 

c— ,/ _l r\.i du = log(l + 6 (a 2 )) 



8n(a 2 ) 



u(l - c + u(l + 25 (u))) 

S Q (u) - a 2 5i{u) 
a 2 1 - c + u(l + 25 (u)) 



1 + 5 (cr 2 ) 



clog ( 1 



1 



1 



a 2 1 + 5 (a 2 



C 



du 



log 1 - - 



<5oO 



2\2 



c(l + S (a 2 )y 



Thus, 



E 



[fiTe x ] = (^JnKC - it/3 log (\ - ~ 



5 n (a 2 ) 2 



c(l + 5 (a 2 )) 2 

Consider now the second term in (14131 ) (without the leading —t): 



E [e x ] + O (n- 1 ) 



E 



H n \ 2 



(O 



2E 



(a) 



(&) 



2E 



1 HH V 



c-a 2 — trQ ) e 



2a 2 E 



— trQ 



E [e x ] + O (n- 1 ) 



(419) 
(420) 

(421) 

(422) 
(423) 
(424) 
(425) 



- 2cE [e x 

= 2cE [e*] - 2a% (a 2 ) E [e x ] + O (n' 1 ) 
= 2 (c - <x 2 <5 (a 2 ) ) E [e x ] + 0(n- 1 ) (426) 

where (a) is due to (|224|) . (5) follows from Lemma [7] in Appendix O and Proposition |6] in 
Appendix IC-A[ and (c) results from Theorem |5] in Appendix |D] 
Combining (14131) . (14211) . and (14261) . we have shown that 

dMt) 



Of 



(iV^KC - te 2 + ^j E [e x ] + O (P(t)r^ 1 ) 



(427) 
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where we recall from (|27T i the definition 

el- 



<) +2 (f "AM)- (428) 



c(l + 5 (a 2 )) 2 , 

One can see from Property Q] (i) and (n) in Appendix ID1 and from < c — a 2 5 (a 2 ) < c, that 



Q\ > 0. 



We now establish a relation between 0„(t) and E [e x ] in order to establish a differential 
equation for 4> n (t) from (14271 ) . From (14051 ). 



n (t)=E[e*]+E 
where, by (14081) and \e x \ < 1, 



E 



1 —ff n / \ i 

n > e" (x)dx 



< E 
= O {P(t)n- 2 ) 

Thus, we can replace E [e x ] in (I427T) by <f> n (t) + O (P(t)n~ 2 ) to obtain 

d<f> n (t) 



e n {x)\dx 



Of 



(iV^KC - te 2 + ) <f> n (t) + O {P(t)n- 1 ) . 



Integrating this differential equation with respect to t leads to 



<f> n (t) = e 



UVnKC-^-e 



Denote cj) n (t) = E 



+ \ N,K 



-c) 



which is well defined since 6\ > 0. Then, 



l VriKC 



r: T+o(p(^-)n 



-» e 2 



-i 



which implies by Levy's continuity theorem D24J that 



(Ck - c) 



AT (0,1). 



(429) 

(430) 
(431) 

(432) 

(433) 



(434) 

(435) 
(436) 

(437) 



and concludes the proof. 
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D, Proof of Proposition \2\ in Section \IV-B1\ 

As in previous sections, we often drop the index n in matrix notations. First recall the 



definitions 



(438) 



<7 



H tr 



(HH H + CT 2 Ijv) 1 (HX + W / )(HX + W / ) H -iyW /H 



and define the function 



E 



,X",H" x" 
l N,K 



(439) 



(440) 



By (12961) and the elementary properties of the characteristic function, 



E 



VnKI N £ - ii n 



dt 2 



K n ' H "-^r) 2 +(e" H ") 2 (44D 



t=o 



where we recall the definitions from (I2721 i and (12731 ) 

1 HH H 



X n H n 



— log det ( I N + 



(7- 



K 



n ^HAH 
— tr Q 

K ^ K 



H 



(442) 



HH H \ 2 2<x 2 



, „HXX H H H 

K \ nK 



(443) 



The absence of additional terms is linked to e^ n ' nn (t) which is of the form £^"' Hn (t) 
t 2 M[f(W n 7 X. n ,U n )e itT ] (see (I275T) ). 
Therefore, 

2" 



E 



E 



(/i: 



X n ,#" ,,X"\ 2 | (nX. n ,H n \ 2 



(444) 



where we recall that I^ K , fJ^ n ' Hn , and (0^ n,Hn ) 2 are respectively defined by (|439l ). (14421) . and 
(14431) with H replaced by the random variable H. Notice now that 9^ n ' H " = 0(1) and, by 
Theorem |6] in Appendix |D] and Proposition [7] in Appendix IC-A[ 



(445) 
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Thus, 



E 



Var[^]+^(^A 2 )+0(1). 



X n ,H n 
H'n 



K 3 



trA 2 



0(1) (446) 
(447) 



From Remark [8] in Appendix [Ql 



Var •*"] < £ ( JVar 



logdet 1 N + 



1 HH H 

o* K 



+ WVar 



trQ 



#AiP 



A' 



(448) 



I HAH H 



0(±trA 2 ). It 



From Proposition |6] (Hi) in Appendix IC-Al we know that Var trQ- A 
remains to find a bound for the variance of the first term in (14481) . By Lemma [9] in Appendix 151 

1 HH H 



Var 



logdet I/v + 



a 2 A 



a 



1 d(HH H ) 

—trO— - 

A 



Or* ^ A 

1,3 L 

2 1 2 HH H 
— r— trQ 2 

a 4 A v fsf 

0(1). 



(449) 

(450) 

(451) 

(452) 
(453) 



Thus, Var j/i*"'^"] = (l + ^tr A 2 ). Plugging this result into (14471) . we finally obtain 



E 



(454) 
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Appendix C 
Additional random matrix results 

A. Auxiliary results 

Lemma 4: Let G G C Mxi have i.i.d. entries G i3 - ~ CAf(0, 1) and let S G <C LxM and T G 
C MxM . Then, 

(z) Var[trSC7] =trSS H 

(ii) Var [trTGC7 H ] < 2LtrTT H . 
Proof: The proof of part (?) follows directly from 

Var [trSG] = E [(trSG) 2 ] (455) 

M L 
i=l j=l 

= trSS H . (457) 



Part (ii) can be proved relying on Lemma [9] and Lemma [TO] in Appendix |D] 



M L 

Var [trTC7G H ] <J]^E 

i=l j=l 
M L 

=££« 

i=l j=l 



dtr TC7G H 



2 

+ 


<9tr TGC7 H 


2" 




9G* 3 





uh qp dG « 



M M 



EE T > 



9 [GC7 H ] 



JLK/ 



5=1 p=l 'J 



M L 




M 


2 


Af 


2" 






^ TqiG qj 


+ 






i=i j=i 




q=l 




P =i 





Af L /A/ 



a; 



8=1 j = l \g=l 

2LtrTT H . 



P =i 



(458) 
(459) 

(460) 

(461) 
(462) 



Lemma 5: Let G G C MxL have i.i.d. entries G %3 ~ O/V(0, 1). Let T G <D MxM be a determin- 
istic matrix and w be a function of G. Then, 



E [trTGGV] = LtrTE [e w ] + E 
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Proof: By Lemma [8] in Appendix [Ql 
E [trTGGV] =E 



E 



E 



A,j,k 



kj ■ 



dco 



LtrTE [e w ] +E 



(463) 
(464) 
(465) 
(466) 



Proposition 6: Let i7 £ fl^xx h ave { j ^ elements Hij ~ £/V(0, 1) and define the functionals 
Q(x) = {-^HH H + xl N y l and Q(x) = (j<H H H + for x > 0. Further, let C, D G 

C NxN and C,D G C^ xi ^. Then, for u,v > and any nonnegative integer m, the following 
holds: 

" 1 



Var 



(«) 
(iii) Var 

(if) Var 



Var 



A" 
1 

K 



tr CQ(u)~DQ(vy 
trCQ(u)t>Q(vY 



(JZ + m) II 13 II 2 
< 2 VV „"„ ,/ " J trCC H 



u 2y2m+l J£3 

+ ||D|| 2 ~~ H 

— u 2 v 2m+l f£3 

(,/^ + 2m) 2 1 

S ^ ^m+l if 3 

(2 A /^ + 2m- l) 2 1 ~ ~ H 
< 2 V V "„ - — ; '-— trCC H , m>l. 



A 3 



Moreover, for C and C Hermitian, 

1 



(v) Var 



(fi) Var 



K 
1 

A 7 



trCQ(«)CQ(v) 
\xCQ(u)CQ(v) 



— ^2^2771+1 ^-3 

< 2 W „"„ ,/ -^trC 4 . 



2^2771+1 J£3 



Proof: We only prove parts (i), (Hi), (iv), and (v). Parts (u) and (uz) are proved along the 
same steps. 
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By the Poincare-Nash inequality (Lemma [9] in Appendix |D]), 
1 



Var 



K 



trCQ(u)DQ(u) 



< 



E E 



h3 



d(±trCQ(u)-DQ(v) m ) 



OH?, 



d(±trCQ(u)-DQ(v) m ) 



dH, 



'j 



(467) 



E E 



1,3 



1 dQ(u) 



+ E 



K tr dH* 3 

J_ te 9Q{u) 



BQ(v) m C + — tr 



1 d(Q(v) 



K dH* 



-CQ(u)B 



K dHi 



BQ(v) m C + — tr 



1 d(Q(v) 



K dHi 



-CQ(u)B 



Making use of the following identity 

dQ(v) m 



dH 



E^) 



k -i9Q(v) 



v k=i 



dH* 



Q{v 



,m—k 



we arrive at 



ij z,=i %3 



k=l 

m 



E^ tr ^Q(^ c ^) D ^: 



fe-i 



K dH*. 
k=i v 



E ^ E ^ [g(.)-*c««)DO(«) M ] . 

^ E E [Q(v) m - k CQ(u)BQ(v) k ^] 



k=l p,q 



in 



(468) 

(469) 

(470) 
(471) 
(472) 

(473) 
(474) 



fc=i 



Similarly, we obtain 

1 d(Q(v) 



K dH i:i 



CQ(u)D = -— E [H H Q(v) m - k+1 CQ(u)T)Q(v) k ] ... (475) 



fc=i 



Following similar calculus, one then shows 



^tr^Dg(t;) m C = [Q^BQivrCQ^H]^ 
4tr^^Dg(^) m C = [H H Q(u)DQ(v) m CQ(u 



K dH, 



(476) 
(477) 
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Replacing (|470T> . (14751) . (14761) . and (14771) in (14671) leads to 



Var 



— trCQ(w)DQ(t;) 



< 



^4 



'■J 



Q(u)DQ(v) m CQ(u)H + ^ Q(?;) m - fe+1 CQ(u)DQ(?;) fc # 



fc=i 



+ E 



# H Q(?i)DQ(i;) m CQ(u) + ^iJ H g(i;) m - fc+1 CQHDQ(t;) 



fc=i 



2-1 



7^ E 



tr (i2 + S) (R + S) H + tr (F + G) (F + G) H 



< — -E \tiRR H +trSS" + 2 Itri^l + trFF H + tr GG H + 2 Itr FG H II 
1 



< —?E 
K 3 



(V tr RR H + VtrSS H ^j + (VtrFF H + VtrGG H 



(478) 

(479) 
(480) 
(481) 



where, in the last inequality, we have used Lemma |6] in Appendix iDl and where we have defined 

1 



R 



S 



Q(u)T>Q(v) m CQ(u)H 



1 m 



\m— fc+1 



CQ(u)BQ(v) k H 



Q(u)C H Q(v) m -D H Q(u)H 



1 m 

G = —= J^Q(y) k T) H Q(u)C H Q(v) m - k+1 H. 



k=l 



Part (i): By Lemma [6] in Appendix IDl we obtain the following bound 



trRR" = tvQ(u)BQ(v) m CQ( 



u)^^Q(u)C"Q(v) m n"Q(u) 



K 



HH" 



< \\Q(v) m B"Q(u) 2 -DQ(v) m \\trC H CQ(u)— —Q{u) 

K 



< 



v 2m v? 



IDl 



rrilH 

Q(u)——Q(u) 



K 



trCC H 



< 



v 2m u 3 



|D|| 2 trCC H 



and, similarly, 



txFF H < 



v 2m u 3 



|D|| 2 trCC H . 



(482) 
(483) 
(484) 
(485) 

(486) 
(487) 
(488) 
(489) 

(490) 
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By the same arguments, we obtain 



trSS H =J2J2 tr ^ 



\m— fc+1 



k=l 1=1 
m m 

fc=i 1=1 

m m 

<££II<m 

k=l 1=1 

< m \ J |D]| 2 trCC H 



CQ(u)BQ(v) k ^^Q(v) l B"Q(u)C"Q(v) m - l+1 



\2m+2—k—l 



2m+2-k-l I 



ivC H CQ{u)T>Q{v) kHH " "<~ M ^"' 



K 



-Q(vyn"Q(u) 



Q(u)BQ(v) k ^^Q(v) l B H Q(u) 
K 



trCC H 



y2m+ly2 1 



and 



trGG H < 



m 



|D|| 2 trCC H . 



^2(201+1)^2 !■ 

Combining (14891) . (l490l> . (l494l> . (14951) . and (I48TT) . we finally arrive at 



Var 



-LrCQ(u)DQ(i;) 



< 2 



1 m 
— + 



IDI 



V" l U 2 1 2 

2 II t~x II 2 



K 3 



-trCC H 



<2 (> + ^) g^ trCC H 



(JZ + m) IIDII 2 

u 2 v 2m+\ ^3 ' 

Par? (m); Based on Remark [8] in Appendix iDl and the result of part (ii) 



Var 



1 FTC FT ^ 

-trQ( M )g( W ) m -^ 



Var 



-UrQ^rC - M ^trg( M )g(t;) m C 



< 



'Var 



-trQ(»™C 



+ A /Var 



u 



(491) 
(492) 

(493) 
(494) 

(495) 

(496) 
(497) 
(498) 

(499) 



-tiQ(u)Q(v)™C 



2 (2m + J^y 1 
< _i v yd. trrr H 

- y 2 m +l K 3 ' 



(500) 
(501) 
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Part (iv): Based on Remark [8] in Appendix IP51 and the result of part (ii), for m > 1, 



Var 



±txQ(u)Q(v) m ^- 



Var 



^-tvQ{u)Q{v) m - x C - vhrQ(u)Q(v) m C 
K K 



(502) 



< * /Var 



K 



txQ{u)Q{v) m C 



+ x /Var 



if 



trg( M )Q(w) m C 



2 (2m- 1 + 2 A /^) 2 1 



if 3 



Part (f): Assume now D = C and C = C H . Then, from the proof of part (i), 

HH H 

trRR" = ixQ{u)CQ{v) m CQ{u)——Q{u)CQ{v) m CQ{u) 

K 



< 



rrrrH 

Q(u)—Q(u) 



\xCQ{v) m CQ{u) 2 CQ{v) m C 



1 



< - \\Q(uY\\ trCQ(v) m C 2 Q(v) m C 
<h\Q(vr\\trC 4 Q(v) m 



< 



v 2m u 3 



trC 4 



and, by symmetry, 



trFF H < 

Following similar calculus, we arrive at 

m m 

txSS" = ^^txQ{v) m - k+1 CQ{u)CQ{i 



v 2m u 3 



trC 4 . 



,HH y 



k=l 1=1 

< \\Q(v) 2m+2 - k - l \\ trCQ(u)CQ(v) k 

k,l 



K 

HH" 



K 



Q(v) l CQ(u)CQ(v] 
Q(v) l CQ(u)C 



m-l+l 



- v 2m+2-k-l \\Q( V , 

k,l 

<y— 



fe+/-l| 



\xCQ{u)C 2 Q{u)C 



k.i 



trC 4 



m 



y2m+ly2 



trCT 



(503) 
(504) 

(505) 
(506) 

(507) 
(508) 
(509) 

(510) 

(511) 
(512) 
(513) 
(514) 
(515) 
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and 



trGG H < 



m 



trC 4 . 



Combining (15091) . (15101) . (15151) . (15161) . and (|481T) leads to 



Var 



K 



trCQ(u)CQ{v) r 



(a + my/u) 1 4 
- 2 VV „"„ ,/ -^trC 4 . 



u 



2 v 2m+l 



(516) 

(517) 
(518) 



Proposition 7: Let {H n }™ =1 , where ff n G C WxX has i.i.d. elements H* ~ OA/(0, 1), and 



define 



^if n (F n ) H + xljv) for x > 0. Let {C n }~ =1 , where C n e C 7VxAr . Then, 



for m > a > and any nonnegative integer m, the following holds as n — — > oo: 



E 



nt J2\m 



K 



tr Q»Q> 



m H n C n (H n ) H 



K 



(ii) 

where, for m > 1, 



E 



1 

K 



tr Q n (u)Q n (a 2 ) m 



lm {u) -trC n + 

K 



S m (u) + O 



' -trOfO) H 



u- 



! K 5 



J m (u) = <J m _i(u) - er 2 5 m (w) 

5 m _!(«) [1 + 5 (a 2 )] + Y!k=i lh-i{u) - a 2 S k {u)} 5 m . k (a 2 ) 



1-c + a 2 [l + 5 (a 2 )} + ud (u) 



and 



7o (u) = c - u5 {u) 

with S (u) as defined in Theorem |5] in Appendix ITil 

Proof: In order to simplify the notations, we drop the dependence of n, e.g., we write H 
instead of H n . We begin by standard Gaussian calculus based on the integration by parts formula 
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(Lemma [8] in Appendix iDl) : 

E 



i,j,k,r,s 



^ E ^ E 



i,j,k,r,s 



d{H: k Q{u) rs [Q {a 



2\ml 



^ E ^ E 

i,j,k,r,s 

+H* rk Q(u) rs 



— trCE 
ii 



#C# H 1 



-tiQ(u)Q(a 2 ) m 



+ ^E E f[ OTH wi 



— E 

a[g(a 2 r 



2\m 



trQ(u)Q(a 2 ) 



To continue, we will develop the term 9 ^ Q( q H 1 



as follows: 



d[Q{a 



2\ml 



oh* 



EE [Q(^ 1 ] sP ^^[Q^ 2 r- k ] qi 



k=l p,q 

m 



2\m-fc+l 



fc=l p,g 



k=l 

Replacing (15251) in (15221) . we arrive at 

E 



l tr g( M )g(a 2 )-^^ 



-LrCE 

K 



fc=i 



-trQ(u)Q(a 2 ) m 



E 



itr Q(u)^f^^trQ(u)Q(a 2 r 



±trQ(u)Q(ar^^^Q(° 2 ) m - k+1 
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By Proposition [6] in Appendix IC-AI and Remark [8] in Appendix O we have 



E 



ltrQ( M )^^ltrQ( M )Q(a 2 r 



E 



1 .HCH 1 

K ^ y 1 K 



E 



-tcQ(u)Q(a 2 ) m 



1 



E 



ItrQ^QkE^lltrQm-k+i 
K VV K K * 



—trQ(u)Q k 

K K 



(527) 



E 



E 



K 



trQ 



|TO— fc + 1 



^ ,rGCH ) 



and, thus, 

E 



ltrQ( M )Q(a 2 r^=^ 



if 



-trCE 



2\m 



A" 



trQ(u)Q(a 2 ) 



— E 



1 .HCH H 



E 



±trQ(u)Q(a 2 ) m 



fe=i 



-trQ(u)Q(a 2 ) k ^^- 



E 



-trQ(a 2 ) m - fc+1 
ii 



u 2 K 5 



Define the following quantities 

1 



K 



l°m («, C) = E 

C («) = E 
which satisfy the relations 

5°_ 1 (a 2 ) = c 

1 ° m (u,I K )=E 

= E 



trQ(u)Q(o- 



K 



m = 0,1,2, 



-trQ(u)Q(a 2 r 



m 



-1,0,1,2, 



±trQ(u)Q(a 2 r^f- 



2\m 



K 



trQ(a 2 ) 



mE 



if 



trQ(u)Q{a 



2\m 



= 5° m ^(a 2 ) - u5° m (u) , Vro. 
For m > 1, we also have from the relations in (|224l ) 



(528) 



trCC H j . (529) 

(530) 
(531) 

(532) 
(533) 

(534) 
(535) 

(536) 
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Using these definitions, we can express (15291 ) as 



7^ («, C) = ^trC^(«) - 7o° («, C) ^(ti) - f>° («, C) + ^-^trCC^ . 

(537) 

Evaluating the last equation for m = and collecting the terms in 7q (it, C) on one side, leads 
to 



S° Q (u) 1 „ „ / / 1 



^("• c) = TTWi ,rC + ° (^ ,rCCH ) ' (538) 

By Theorem |5] in Appendix [Ql 

5 » = J («) + o(^). (539) 



Thus, we can define 



1 + 5 (u) 



such that 



7o° (u, C) = 70 (u) ^trC + O U -i^trCC H j (541) 



where we use the fact that |^trC| < y^-trCC 1-1 and u 4 < u 1 a 6 (since u > a 2 ) to discard 
the term 0(u- 4 R-hr C). 

For m > 1, we can gather the terms involving 7^ (w, C) in (15371) on one side, replace 7q (u, C) 
by 70 (it) ^trC and 5q(u) by 5o(w), to obtain, iteratively on m, 

7m(M ' C) - 1 + 6 + °{y^ trCC )■ 

(542) 

From the last equation, we can obtain a recursive expression of 5 m (u)° by letting C = Ik and 
using the relations (15351) and (15361 ): 

„ f , _ gUfr) [i + ^0 (^ 2 )] + Eg K-iCtQ - ° 2 t»] ^ ro M \ f ^ 

m[U) l-C+^[l + 5 (^)]+uS (u) +U \uK 2 )- P } 

Note that the denominator of the RHS of the last equation is strictly positive (see Property \T\ (i) — 
(Hi) in Appendix iDl). For m = 1, we obtain with the help of (15391 ) 

_ W [l + W] + Q (J_\ (544) 

*[! + *(«) M] -of IV (545) 



l-c + a 2 [l + (5o(a 2 )]+M5o(M) V«^ 2 
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Due to the recursive definition of &° m (u), we can now conclude that 



5°Ju) = 5 m (u) + O [ — ) (546) 



where 



, u s Sjn-M [1 + *o (a 2 )] + Eg [Sk-i(u) ~ ° 2 h{u)] S m . k (a 2 ) 

Mj " l- C + a*[l + *b(<r»)]+w5o(ii) ' m " ° 4/) 



Using (|546l) in (15421) . we have so far proved that, for m > 1, 

7™ («, C) 



jtrC5 m (u) - 7o («) £tr C5 m {u) - 7g K C) ft^ (a 2 ) | g / / 1 trCCH \ 



1 + 5 (a 2 ) \ V U 2 K$ , 



(548) 

where we have relied on the fact that 7^ (u, C) ^2 < ^3 tr C = C ^^strCC H j. In 
particular, for m = 1, we obtain 



7„ («, C) = 7m (u) ^trC + oU ^trCC j (550) 



Iterating the recursion m — 1 times, we have proved that 

/-^rtrca 

where 



) 



lt (u) = 6m{u) (1 " 70 {U)) H lk 6m ' k ( " 2) , rn>l. (551) 

Using now the relation 70 (w) = c — u<5o(w) (see Property [T] (it>) in Appendix ID1). we write the 
last equation as 

171 — 1 

7 m 0) (l + S (a 2 )) = 5 m (u) (1 - c + u5 (u)) - 7fc (u) 8 m - k (a 2 ) . (552) 

fc=i 

Adding 5 m (u)o 2 [1 + 5 (a 2 )] to both sides, we can express 5 m (u) as 

x , s _ hM + ° 2 tm{u)] [1 + 6 (a 2 )] + EITi 1 7. N g m - fc (^ 2 ) 

m[) ~ l-c + a 2 [l + 6 (a 2 )] + uS (u) ' ^ 

Equating (15531) and (15471) . we can see that 7 m (w) must satisfy the following relation 

j m (u) = 5 m _i(u) - cr 2 6 m (V) , m > 1. (554) 

This terminates the proof. ■ 
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B. Proof of Proposition \5\ in Appendix \B-C\ 

Our goal is to provide an asymptotically exact approximation of the quantity E tr Q(u 
The major difficulty lies in a missing scaling factor of \jK in front of the trace term. For this 
reason, we cannot directly apply the Cauchy-Schwarz inequality together with the variance bound 



K 



in Proposition [6] in Appendix IC-Al to decouple E ixQ{u 



HH" 
K ' 



as E 



tr Q(u 



K 



E[e x }+e n , 



where e n — > (see also Remark[8]in Appendix iDl). Therefore, we need to develop E tr Q( 
further with the help of the integration by parts formula (Lemma [8] in Appendix IDl). 
By the product rule of differentiation, Lemma [8] and Lemma \10\ in Appendix IDl 



a. 



HH V 

K 



E 



tr Q(u 



Hw 

K 



E 



E 



E 



E 



\^d_(H* kj Q{u) H e*) 
K^ 



H* kj [Q(u)H] kj Q(u) 



K 



dx 



txQ{u) - ItrQ(u)5jptrQ(ti) + \H H Q{u 



i-j 



(555) 
(556) 
(557) 



N 1 HH 1 _ HH , 1 9x rrrhW 

trQ(w) MQiu) MQiu) H > — — \H H Q( 

ii ti KJ K K K vv 1 K^dH*, L ^ y 

1,3 V 



II. 



(558) 



Gathering the terms involving trQ(u)^jp- on the LHS yields 



E 



tr Q(u 



HH V 

K 



l + - + -UrQ( M ) ) e* 
u K 



N 

— E e* 



E 



K ^ dm, L ^ v 



u i ..e A 

3 l 



i-j 



(559) 



We now develop 



dm 



it 



KK 



n 1 [Qi^H^-t^Ua^Hl 



which is obtained from 



n 1 d(HH") 
trQ— '- 

KK ^ dHt 



Teh™ 



'J 



(560) 
(561) 
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and 



d(9rY _<9(2c-^trQ 



OH. 



'j 



2^ 
K 2 



Using (|560l) and (12291 ), we arrive at 



E 



i,3 3 



u) \ ..e A 

3 l 



iU — E 

K 



^trQ(u)Q(a 2 )^e* 



+ 



By Proposition [6] in Appendix IC-AI and Lemma [7] in Appendix 151 



E 



E 



Ol — 

un 



and further by Proposition [7] in Appendix IC-AI 



E 



1 . ,^HH H 

—trQ(u)Q 

K VV K 



5 Q (u) - a 2 6 1 (u) + O 



Thus, from (15591) . (15641) . and (15661) . 

E 



(562) 
(563) 

(564) 



(565) 



(566) 



trQ(«)^- (l + i + ^trQ(u)) e* 

= -E [e x ] + (<*o(u) - ° 2 Si(u)) E [e*] + O ( — ) . (567) 

u \ K \un J 

We will now develop the LHS of (15671) in an alternative form. Let us first define the following 

quantities: 

' (568) 
HH" 



¥ = — trQ(u) 



$ = trQ(u) 



Using these definitions, we can express the LHS of (15671) as 



E 



trQ(u)— h+_ + _trQ(u))r-v 



1 + - ) E [$e x ] + E [$¥e x ] . 
u 



(569) 



(570) 
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We can develop the second term on the RHS of the last equation as 

E [$*e x ] 

= E [*] E [$e x ] + E [$ (* - E [*]) e x ] 

( = } E [*] E [$e x ] + E [$] E [(* - E [#]) e x ] + ^((im)- 1 ) 



(b) 



E [*] E [$e x ] - E [$] E [*] E [e x ] + E [$] E 



1 iV 1 1 ,M H 



(571) 
(572) 

+ 0((?m)- 1 ) 
(573) 



1 AT 1 

E [*] E [$e x ] — E [$] E m E [e x ] + E [$] E [e x ] - -E 

M li 



1 iV, 



E[$e x ] + C((wn)- 1 ) 



E [$e x ] E [*] - -E 
u 

1 





"1* 




Ie 




) 




if 





- E [$] E m E [e x ] + — — E m E [e x ] + 0((Mn)^; 

u K 

N 



(574) 
(575) 



( = } E [<&ei] \5 {u) - ^7o(«)J - if 7o (it) 5qHE [e x ] + ^ 7 o («) E [e x ] + (^(un)- 1 ) (576) 

where (a) is due to Lemma [7] in Appendix [D] and Proposition |6] in Appendix IC-A[ (b) follows 
from (|224l) . and (c) follows by Proposition [7] in Appendix IC-AI and the fact that |e x | < 1. Thus, 



E 



$ I 1 + - + # ) e x 



1 + - — - + 2S (u) ) E [$e x l 
u 



+ N (— — 25 (u) + -6 (u) 2 ) E [e x ] + (^(un)" 1 ). (577) 
\u c / 

Equating the RHS of ( 1577b and the RHS of (f567l) . and solving for E [$e x ] leads to 

(l-c + 2u5 (u)-^5 (u) 



E [$e x ] = JV- 



1 - c + u {1 + 2S {u)) 



■E e> 



, n So(u) — a 2 5i(u) ^ r v1 „ . 1N 

+ i*i / — u- — — ; r . . , M E [e x ] + c (h ) 



if l-c + u(l + 2<y (w)) 



(578) 



which terminates the proof. 



C. Proof of Proposition |?] in Appendix \B-B\ 

Similar to the proof of Proposition |5] in Appendix IC-Bl we want to derive asymptotically 



exact approximations of E 



H X'\ff" 



(part (z)) and E trQ(cr 



H X",H" 



(part 



(ii)). It turns out that part (i) is needed for the proof of part {%%). The additional difficulty 
compared to the proof of Proposition [5] results from a) the much more involved expression of 
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7^ >" (see (13081) ) compared to x (see (14061 )) which depends on X whose spectral norm cannot 
be bounded from above and b) the presence of the matrix A in the trace (part (ii)) which 
introduces additional terms which must be controlled. 

In the proofs below, we will often use the notation P(t) or -Pj(t) to refer to some non-zero 
polynomials in t with nonnegative coefficients. These polynomials may take different values 
from one equation to the next. 

Proof of part (i): Mimicking the steps leading to (15581) and (15591) . we easily obtain 

1 



E 



trQ(w)-^- ( 1 + - + ^trQ(w) | < 



X",H" 



N 
— E 

u 



'■J 



fry. 



(579) 



Recall that 7* 



X n ,H n t 



V (9n n,Hn ) +i^ K * n,Hn (13081) . From the standard derivation 



rules as provided in Lemma [TOl and Corollary [T]in Appendix iDl denoting Q = Q(& 2 ) for brevity, 



Q#— XX H 

nK 



Q±HAH»Q±H 



(580) 



Similarly, 



K 




2a 


2 


H 






2(T 2 




if 


Q 



2 



^ K 



1 

if 



if 



nK 
2a 4 



2a 2 
nK 



Q^-HXX H H H Q 2 ^-H 



Q 2 —HXX H H H Q—H 

K * K 



(581) 



+ 



— Q 2 ^H-XX H Q 

K K n 



2a^ 
K 



Q-^—HXX H H H Q 2 -j-H 

Kn K 



(582) 



where, in the last equality, we used 1^ — j^QHH H = a 2 Q, 1 K — -^H H QH = a 2 Q, and 
QH = HQ. Following the same derivation, we also have 



dn 



X n ,H n 
n 



3a 2 



y/nK 3 
3a 4 



3a 4 



1.1 



Vn^K 3 



Q^-HXX H H H Q 3 l,H 
K K 



+ 



Vn 3 K 3 



-^-HH"Q 3 -j-HXX H Q 
K K 



3a 2 



Vn 3 K : 



(583) 

±HH H Q 2 -^HXX H H H Q 2 ±-H 
K K K 

(584) 
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Using these results, the second term on the RHS of (15791 ) can be developed as follows: 



i-j 



11 —[H"Q( 



dm 



u)\ ..e' 
31 



HXX. H H" 

nK 



-Q(u)e> 



E 



— trO 

K ^ K 



HAH H HH 



Q——Q(u)e' 



K 



+ 
(585) 



1 HH V 



Q{u)e^ 



— E 



a 2 ^HAH V 



■QQ(u)e 



X",H" 



-so 

(586) 

for some polynomial P(t), where (a) follows from the derivative of 7^"^" as developed 
in (15801) — (|5 841) and the observations that all terms resulting from (0%~ n ' Hn ) 2 and K^ n ' H " are 



0((uK) 1 ) and 0(u 1 K 2 ), respectively, and (b) follows from Q 



K 



In-v 2 Q (see (12241) ) 



and the definition of A = I K - -XX H . 

Based on Proposition [6] in Appendix IC- Al and Lemma [7] in Appendix iDl we find the following 
estimations: 



'pyy 

uK 



E 



1 HH V 



Q(u)e 



X' l ,H" 



E 



E 



E 



1 . ,^HH H 
-txQ(u)Q— 



a 2 2 HAH V 

— trQ(u)Q 2 

K ^ V K 





r X » B »- 




E 


g7n 





(587) 





- x n ff n- 




E 


g7n 


+ ( 



It 



'K 3 



By Proposition [7] in Appendix IC-A[ 

E 



1 „. .^HH^ 
-KQ(u)Q— 



E 



a 2 „, ,„ 2 HAH" 



^Q{u)Q 2 R 



8 {u) - (PSxiu) + O 



<? 2 l2 (w) — trA + C 



iiif 2 



Combining (15791) . (15861) . (15871 (I588T) . d589l) . and (1590b . we obtain 
HH" 



trA 2 



E 



trQ(u) 



A" 



1 + - + ^trQ(u) ) e 7 " 

II A 



at 

— E 

it 



+ 



it\j — I5 (u) - <t%(u) - a 2 j 2 (u) —trA ) E 



uVK 



P(t) 



for some other polynomial P(t), where we used in particular \/K~Hv A 2 < 1/yK. 



trA 2 
(588) 

(589) 
(590) 



(591) 
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Next, we consider the LHS of (1579b . Let us first define the following quantities: 

1 H 

# = — trQ(u), $ = trQ(u)-^. 



(592) 



Using these definitions, we can express the LHS of (15791 ) as 



E 



trQ(w 



is: 



1 + - + ^trQ{u) ) e 7 '" 
u K 



1 + 



E 



$e 7n 



+ E 



x".s" 



(593) 



Similar to the proof of Proposition |5] in Appendix IC-B1 we can develop the second term on 
the RHS of the last equation as follows: 



E 



(a) 



E[*]E 
E[¥]E 



+ E 



-E[tf])e 



+ E [$] E 



E[tf]E 



E 



Ik 

u 



$e 7 



1* 



E [$] E [*] E 



1 AT 11 n/ HH H 

txQ(u) 

uK uK K 



E [$] E [*] E 



e 7 ™ 



M 2 K 



1 AT 
+ — — E <3> E 

It A 



3 7n 



E 



$e 7n 







« 2 a: 



E [*] E 

u 



1 AT 
+ -— E $ E 



A' 




E [$] E [*] E 



e 7» 



1 



(6) 



E 



5o(«) 7o(X 

it 



+ -cK lo («) E 
it 



(V) f 25 Q {u)--\E <Iv- 



O 



u 2 K 
-K l0 (u) 6 (u)E 

\uK 



(594) 



(595) 



(596) 



(597) 



(598) 



+ 0(-) (599) 



where (a) follows from Remark [8] and Proposition |6] in Appendix IC-A1 and ^ is expanded using 



(12241) . (b) follows by Proposition [7] in Appendix IC-AI and the fact that 



g7n 



< 1, and in (c) 
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we used 70(11) = c — u5 (u) (see Proposition [7]). Thus, (15931 ) can be expressed as 



E 



$ ( 1 + - + * ) e 7 " 

it 



1 + + 2<5 (w) ) E 



$e 1 



/ c u 
+ N(-- 25 (u) + -S (u) 
\u c 



E 



E 



Equating the RHS of (l600l) and the RHS of (f59TT) and solving for E 

(l-c + 2u5 (u) - ^5 (^ 



$e 7 " 



(600) 
leads to 



$e 7n 



N- 



l-c + u(l + 2S (u)) 



■E 




S (u) - a 2 5i(u) - cr 2 72 (u) ^tr A 



l-c + u(l + 2<y (u)) 



-E 



g7n 



+ 



1 



P(t) 



(601) 



for some polynomial P(t). 

This concludes the proof of part (z). 

Proof of part (ii): We begin as in the proof of part (z). From the derivative of ^ n,Hn 
(1580b — (1584-1) and standard Gaussian calculus, we have 
HAH H x«,h 



in 



E 



trQ- 



1 



K 



i,j,k,l 



K 
1 

K X 



i,j,k,l 



dm 



(602) 



(603) 



r x",b™" 




ItrQe 7 " 


-E 



-trQtrQ— 



X",Jf" 



'■J 



x",r 



(604) 



71 

itJ-E 



r x".ff™i 




trQe 7 " 


-E 



1 _ ^#A# h 
trQtrQ — — 

if if 



-e 7 " 



K 



,HXX H AH" 

~Kn 



1 „HAH H HAH H 
K K v K 



+ 



Pi® 



K 2 



tr A' 



(605) 
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for some polynomial P%(t), where the last line follows from the observation that the terms in 
the derivative of 7^"^" resulting from (6^ n ' H ") 2 and K^ n ' Hn are of order 0(jLtr A 2 ) and 
0(j^tr A 2 ), respectively (this follows in particular from (A~ _1 tr A) 2 < A _1 tr A 2 ). 
Rearranging the terms and using the resolvent identity (I224|) . one arrives at 



E 



trQ- 



HAH V 



A 

n 



1 + —trQ ] e 7 " 



N 1 
cr 2 A 



trAE 



1 1 

cr 2 A 



trAE 



txQ- 



HH V 



A 




\ itJ-E 



-trQ 2 - + ^trQ f Q— - 



K 



Using the identity A — A 2 



Kn 



h\ 2 



+ 1 + 



t \ t 2 



K J K 2 
(606) 



tr A 



A, 



Tt 

ilJ-E 



/ n ^ 
Note now that 



HXX"AH" 

An 

HAH" 



+ -^trQ(Q- 



#A# H 



K 



Var 



Var 
Var 



trQ Q 



_ tr Q 2 



HAH H 



1 ^ 2 HA 2 H H 1 ^ ( H AH H 
trQ 2 77 + T^trQ Q 



A" 



A 



A 



A 



(607) 



A 



1 ~r,H H H ~H H H 
—trQ 2 AO 

A * A ^ A 



A 



A 



trQ I K -cx 2 Q A I^-a 2 Q A 



(608) 
(609) 



< 



'Var 



1 



^trQA 2 



+ J Var 



cr 



—trQ 2 A 2 



+ 4 /Var 



(7 



-tr (QA 



'Var 



° ' ^a tr a4 



O I 1 Itr A 2 



^trQ 2 AQA 
(610) 

(611) 



where the inequality follows from Remark [8] in Appendix |D] and the last line follows from a 
direct application of Proposition |6] in Appendix IC-AI to each of the individual terms, along with 
trA 4 < (trA 2 ) 2 . By Proposition |6l 



Var 



Var 



1 ^ 2 HAH H 
—trQ 2 

A ^ A 

1 ^ 2 HA 2 H" 
A ^ A 



O [ -trA 2 



O [ -trA 4 



O I 1 ( J- trA 2 
A V A 



(612) 
(613) 
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Thus, by Lemma [7J in Appendix iDl the RHS of (16071) can be written as 



Tl 



HAH" 



HA 2 H H 



+ — trQ Q 



K 



HAH H 



K 



Tl 

i/J-E 



+ o 



2 HAH H 

K trQ ^r~ K 



,HA 2 H H 



K 



K 







r x™,^' 1 ! 


(«T H )T 


E 


g7n 



t 1 



-trA' 



By Proposition [7J in Appendix IC-A[ we can approximate the first two terms in (16141) by 

, HAH H 



E 



E 



K aQl — 



1 ^HA 2 H H 

— trQ 2 

K ^ K 



Tl (a 2 ) -^trA + O 



1 

^5 



trA 2 



^ 2 )> a+0 (tf) 



It remains to find an approximation of the term E 
pendix O 



7l (a 2 ) ^trA 2 + 
7i (a 2 ) ^trA 2 + o( 

>-g(g 



K 5 

1 1 



if 3 K 

2 



trA 4 
trA 2 



(614) 

(615) 
(616) 
(617) 
(618) 



HAH H 

K 



By Lemma [8] in Ap- 



E 



K 



txQ Q 



HAH 



H\ 2" 



K 



E 



] -ixHAH H Q 2 HAH H Q 



K 3 



-^^e [A# H g 2 #Aii H g] 



.7' 



1,3 



K 3 



9 [a# h q 2 #a# h q] ' 



(619) 
(620) 

(621) 



The derivative further develops as 
d [AH H Q 2 HAH H Q] 



1 



A 3J [Q 2 HAH H Q] U - - [AH H QH].. [Q 2 HAH H Q].. 



- - [AH H Q 2 H].. [QHAH^Q]^ + [AH H Q 2 HA] .. Q 



- - [AH"Q 2 HAH"QH] jj Q ii . 



(622) 
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Replacing (16221 ) in (16211) and rearranging the resulting terms, we arrive at 



E 



^trQ ( Q 



HAH 



h\ 2 



K 



E 



1 3 HAH H ( 1 

— trCr — 

K K \ K 



i + ^trg 



. 1 ^HAH V 
tTA -K trQ — 



E 



1 ^ 2 HAH 



H\ 2 



+ E 



1 1 ^ 2 HA 2 H H 



(623) 



Applying Proposition [7J in Appendix IC-AI together with Proposition [6] in Appendix IC-AI and 
Lemma [7J in Appendix O to the individual terms leads to 



E 



^trQ ( O 



HAH 



h\ 2 



A' 



1 + — trQ 

K * 



K 



: tr A 



72 (a 2 ) - 70 (a 2 ) 72 (a 2 ) - 7i (a 2 ) 2 + 5 (a 2 ) 7i (^ 2 ) > A 2 + 



K 



1 1 



(624) 



trA 2 



Similarly, by Lemma [7J Proposition |6] the variance bound in (161 II) . and Proposition [7J 



E 



H\ 2 



i + ^trg 



E 



E 



1 

A 



1 



trQ Q 



#A# 



H \ 2 



A 



E 



1 + ^trQ 



^trQ ( Q 



HAH 



h\ 2 



A 



(l + 5 (a 2 ))+O 



A 5 

1 1 

/pA" 



tr A 4 



: tr A' 



(625) 
(626) 



Equating the RHSs of (16241) and (16261) and solving for E 



T^trQ (Q 



HAH H 



K 



yields 



E 



HAH 



H\ 2 



A 



(^trA) z 72 (a 2 ) (1 - 7o (a 2 )) - 7l (a 2 ) 2 + <5 (a 2 ) 7l (a 2 ) ^tr A 



l + 5o(a 2 ) 



+ 



1 1 



rtrA' 



Similar to the proof of Part (i), let us define 

1 



* = ^trQ 



$ = trQ 



HAH H 
A ' 



(627) 

(628) 
(629) 
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Putting the results from (I606D . (16071 ). (1614b . (16151) . (I618D . and (I627D together, we conclude that 

e «i>i i • M')r- ' 

1 r x n ,H n i I I 



a 2 K 



rtrAE 



a 2 K 



rtrAE 



trQ 



+^71 (- 2 )(^ trA -^ trA2 ) E 



K 



HH" 

e 7n 



,7n 



+ it 



(±tr A) z [ 72 (a 2 ) (1 - 7o (a 2 )) - 7l (a 2 )^ + 5 (a 2 ) 7l (a 2 ) £tr A 



l + 5o(^ 2 



-E 



e 7» 



A(t) | tP 2 jt) 1 



(630) 



for two polynomials P\{t) and P2(t), where the term t in front of P2(X) arises from the pre- 
multiplication by at least it of the various estimators involved. 



in the LHS of the 



We now need to find an alternative representation of the term E 
last equation. Following the same arguments as in (15941 )- (|599l) . and using Vf^trA 2 < 1/VK, 
we can write 



E 



$\]>e 7n 



E[tf]E 
E[*]E 

E[tf]E 
E[tf]E 
+ E [$] E 



$e 7n 



+ E 



_ E[^])e 7 " 



$e 7 



$e 7 " 



x n H n 

$e 7n 



+ E[$]E (^-E[^])e 1 



+ o 



E [$] E [*] E 
E [$] E [*] E 



IN 11 „HH V 



a 2 K a 2 K irQ ~ K 



+ E [$] E 



+ 



^e 1 



(631) 
(632) 

(633) 



(634) 



E[tf]E 



--E 

a 2 



5o(a 2 )E 



$e 7 ™ 



+ 



A' 



a- 1 



- KE [*] E 











E 




E 


e 7n 

















E 







trQ 



$e 1 



(635) 



^7o (a 2 ) ^trAE 



Va 2 
trQ 



+ ( - - ^(a 2 ) ) 70 (^ 2 ) ^trAE 



K 



1 

1 



(636) 
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From the last result and (16301 ), we have 



E 



$ (1 + e 7 " 



X",if" 



N -K6 (a 2 )) 7o (<x 2 ) ^trAE 



a 



-^7o(0 ^trAE 
cr A 



N 1 tir , 
--trAE 

a 1 A 



trQ 



PP H X n ,H™ 



A' 



1 



(637) 



iitrAE 



CP 



A 



trQ- 



A 



W^i (a 2 ) (^trA-ltrA 2 )E 



+ 



(£trA)* 72 (a 2 )(l- 70 (a 2 ))- 7l ( ( T 2 ) 2 + 5 (a 2 ) 7l (a 2 ) £tr A 



A 

P x (t) | tP 2 jt) 1 jrA2 



1 + 5o(a 2 ) 



-E 



>K VA A 
for some polynomials Pi(t) and P2(£). 



(638) 



Solving (16371) and (16381) for E 



$e 7 " 



yields 



E 



$e 7n 



^+(A<5 (a 2 )-5) 7 o(a 2 )] itrAE 



^( 7o (a 2 )-l)itrAE trQ^fe 7 ' 



rH X n ,H" 



+ iM n 7l (. 2 )(^A-^A 2 ) E 



A l + 5 (a 2 ) 



e 7» 



if 



+ 




(£trA) 2 72(a 2 )(l-7o(^ 2 ))-7i(^ 2 ) 2 + 5 (^ 2 ) 7i (^ 2 ) ^tr A 



A 



A(g tP 2 jt) 1 2 
A A 



a+^ 2 )r 



-E 



e rn 



A 



(639) 



l_l 

Using now the result from part (i) as given in (16011) to replace the term E trQ-^-e 7n 
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in the last equation, we finally obtain 



E 



trQ 



HAH" 

K 



^+(^o(a 2 )-^)7o(^ 2 )] ^trA r ^ 
l + 5o(^ 2 ) r 

N itr A [ 7o (<x 2 ) - 1] [l - c + 2<T 2 5 (^ 2 ) - £<f„(<7 



+ 



[l + 5 (^ 2 )] [l-c + <7 2 (l + 2<5o(<7 2 ))] 



E 



;< . £tr A [ 7o (a 2 ) - 1] [jp (a 2 ) - a% (a 2 ) - a 2 72 (a 2 ) ±tx A] £ 



if 



+ 



[l + 5 (^ 2 )][l-c + a 2 (l + 25 ( ( 7 2 ))] 
«7i (^ 2 ) (/?trA- itrA 2 ). 



e 7n 



l + <* (<7 2 } 



-E 



e 7u 



(itr A) z [ 72 (a 2 ) (1 - 70 (^ 2 )) - 7i (^ 2 ) 2 J + $o (a 2 ) 7i (^ 2 ) £tr A 
* (l + 5 (a 2 )) 2 

Px(f) , fP 2 (f) l_ trA2 



-E 



This concludes the proof of part (ii). 



(640) 
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Appendix D 
Gaussian tools and related results 

Lemma 6 (Some matrix inequalities): For two N x N matrices A and B, the following holds 

(z) |tr AB| < V tr AA H tr BB H . 
If A is Hermitian nonnegative definite, it further holds that 

(z'z) |trAB| < ||B||trA 
(in) — tr A < IIAII . 

v ) N - II II 

Lemma 7 (Cauchy-Schwarz inequality): For two complex random variables x and y, 

\E[xy]\ < ^\xY]V^W}- 

Remark 8 (Application of the Cauchy-Schwarz inequality): Consider two random variables x 
and y. By the Cauchy-Schwarz inequality, 



\E[(x-E[x})(y-E[y])]\ < y/Yax[xWVai[y]. 



Thus, 



(641) 

(642) 
(643) 

(644) 
(645) 
(646) 

Lemma 8 (Integration by parts formula / 125] Equation (2.1.42)]): Let x = [x±, . . . ,xn] T ~ 
CA/"(0, R) and let f(x) = f {x\, . . .x^,x\, . . .x* N ) be a C 1 complex function, polynomially 
bounded together with its derivatives. Then, 

r df(x 



\E [xy}\ = |E [x] E[y]+E [(x - E [x]) {y-E 
< \E [x] E[y}\ + ^/Var[a;] y/Vai[y]. 

Moreover, it follows that 

Var[a; +y] = Ysa[x] + Var[y] + 2Re {E [(x — E[x])(y — E 



< Var[x] + Y&r{y] + 2^Yar[x] y/Yai[y] 
/ Var]x] + y/Vax[y] 



E [Xif(x)} = R ij E 

3=1 



dx* _j 
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Remark 9 (Integration by parts formula for functional of matrices with i.i.d. entries): Let / (W) 
be a C 1 complex function of the elements of W and W*, polynomially bounded together with 
its derivatives, where W has i.i.d. entries Wy ~ CJ\f(0, 1). Then 

~df(wy 



E[W ij f(W)]=E 



dW* 



(647) 



Lemma 9 (Poincare-Nash Inequality / [23] Propostion 2.1.6]): Let x and f{x) be defined as in 

Lemma[8]and let V x f(x) = [df(x)/dxi, . . . , df(x)/dx N ] T and V x *f(x) = [df(x)/dx\, . . . , df(x)/dx 
Then, 



Var [f(x)} < E [V x f(x) T RVJ(x)*] + E [V x * f (x) H ~RV x * f (x 

Remark 10 (Poincare-Nash Inequality for functionals of matrices with i.i.d. entries): Let / (W) 
be a function of the elements of W and W* as in Remark |9l where W 6 C JVx " has i.i.d. entries 

W i:j ~CjV(0,l). Then, 



9/W 



9/W 



<9W* 



AT n 

Var[/(iy)]<^^E — ^ : 

i=i j=i 

Lemma 10 (Identities for Complex Derivatives): Let H G C iVx/s: . Then, 

«9H r , 



«9H 
<9H 



«9H 
3 [HH H ] 



P9 



<9H? 



d [HH H ] 



P9 



5 [H H H] 



on* 



8 [H H H] 



PI 



<9H, ; 



o 



S ip H* qj 



OjpH-iq 



(648) 



Moreover, denote Q = (|HH H +xljv) 1 and Q = (^H H H + xI K ) 1 for some x > 0. 
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Then, 

dQ pq _ 1 



dU* 3 K 



^[QH] pj Q 



[H Q]jgQpj 



dUy K 



Qpj [HQ] iq 



dn* 3 k 



Corollary 1: Let H G C NxK and C G <E NxN . Denote Q = (-^HH H + xLv) 1 for some 
x > 0. Then, 

w tr 4 c = ^ IQCQH|tt 

CO ««CHCH1„. 
Proof: For part (z), it follows from Lemma ITOl that 



K H3 
For part (zz), we similarly have from Lemma [101 

9 (HH H ) ^ d [HH H ] 



1[QCQH].. (651) 



tt -^ C = E-a^ a C„ (652) 
= ^ ^gHpjCgp (653) 
= [CH].. . (654) 
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Theorem 5: Let {H n }™ =1 , where H n £ C NxK has i.i.d. entries m ~ CJV (0, 1). For u > 0, let 



1 #n (#n)H + ^ 



and Q n (w) 



-i 



£ (# n ) H # n + ul K ) ". Then, as n ^% oo, 
1 



E 
E 



K 



tr Q n (u) 



—txQ n {u) 



So (u) + O 
So (u) + O 



u A n 2 



u 4 n 2 



where 



r/ , c-1 1 v/(l - c + m) 2 + 4cm 
W = ^T-2 + 2^ 



^o(m) = S (u) 



c-1 



u 



Proof: The proof follows from a direct adaption of [23 , Theorem 7.2.2], see also 11331 
Theorem 3 and Proposition 5] for a more complex matrix model. As opposed to these works, 
we maintain the dependence on u in the bounds. Let * n («) = E [±tr Q n {u)]. Then, 

(a) C 11 



u u u 



i ff n f ff n i^ 1 



i£-±¥ n ( u ) + £* n ( u )-E 
u u u 



K 



trQ n (u) 



u u u 



(655) 

(656) 

(657) 
(658) 



with \e n \ < -fjfz, where (a) is due to the resolvent identity (I224I) . (b) follows from an application 
of Lemma [8] together with the derivation rules in Lemma [TO] in Appendix [Ql (c) follows again 
from the resolvent identity (|224l ). and (d) results from the Remark |8] in Appendix iDl along with 



e n < Var 



^trQ» 



- K 2 ^ 



2 x - 

— - 7 E 



<9trQ n (» 



S(J3*)' 



-[Q» 2 in, 



E [trQ n (u) 2 #"(# n ) H Q n (u) 2 ] 



(659) 
(660) 

(661) 
(662) 



May 21, 2013 



DRAFT 



112 

from Remark [TOl and then the fact that 

2 2 2N 

— E [ixQ n {ufH n {H n rQ n {uf] < —E{tr H n (H n f] = —. (663) 

By (16581) and Property \T\ (Hi) in Appendix |D] which can be written 5 (u) — - — ^5 (u) + 
-So(u) — 5 (u) 2 , we obtain 

1 — c 

- 6 (u) = (5 (u) - *„(«)) + 5 (u) 2 - ^n(n) 2 + e„ (664) 

it 

= (<$o(u) - + S (u) + ^ n (u)^j + e n . (665) 

Remark now that 

1 + ^—^ + S (u) + V n {u) = — ^- + * n (u) > -t^- > 1 (666) 
U UOq{U) uo {u) 

which follows respectively from Property [T| (Hi) in Appendix iDl then from u^ n (u) > and 
5 (u) > (Property [T] (i)), and finally from u5 (u)/c < 1 (Property Q] (ii)). 

Gathering the terms in 5q(u) — ^ n (u) of (|665l) . according to this reasoning, we obtain 

£ TJ. 



|*„(«) -5o(m) I 



(667) 



l + ^ + 5 (u) + ^ n (u) 
< \e n \ (668) 
This terminates the proof of the first part. For the second part, it is sufficient to notice that 
itrQ»( W ) = £trQ» - ^i. ■ 
Remark 11: The function s(z) = Mz£l f or z g C\R + corresponds to the Stieltjes transform 
of the Marcenko-Pastur law. For more details, see, e.g., [32, Chapter 3.2]. 

Property 1 (Some properties of5 (u)): The function 5 (u), u > 0, as defined in Theorem [5] 
satisfies 

(«") W > n ~ ^ 2 ~ > 

(1 + vc) 2 + u 

Q 

(ii) 5 (u) < - 

u 

c 

(m) <J (u) 



l-c + u(l + <J (w)) 



1 + d {u) 

1 — c + m5q(w) 



1 + Sq(u) 

5 (x)(l + 5 (x)) 



(vi) 5' (x) 



1 - c + x(l + 25 (x)) ' 
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Proof: Properties (i)~(iii) are due to S (u) = cm(—u), where m(z) is the Stieltjes transform 
of the Marcenko-Pastur law with support in [(1 — y/c) 2 , (1 + \/c) 2 ] U {0} (see Remark [TT] in 
Appendix |Dj). Property (iv) follows from (Hi) since 

c 



5 ° {U) l-c + u(l + 5 (u)) 
5 {u) = (1 + 6o(u)) c - u5 {u) (1 + 6 (u)) 
5 (u) 



1 + 5 n (u) 



c — u5 (u). 



(669) 

(670) 
(671) 



Property (v) follows from (Hi) and (iv). Property (vi) is obtained from the differentiation of 



c = S (x)(l — c + x) + xS (xY 



(672) 



which follows from Property (Hi). ■ 
Theorem 6: Let {H n }™ =l , where H n 6 <C NxK has i.i.d. entries H§ ~ CA^(0, 1). Let cr 2 > 0. 



rrn 

Then, as ri > oo, 



E 



logdet I 



N 



G- 



K 



C (a 2 ) + O 



rr 



where 



C(a 2 ) =log (l + 5 (a 2 )) +clog 1 



<5n (a 2 ) 



a 2 (l + 5 (a 2 ))J l + 5 (a 2 ) 
and 5o (a 2 ) as defined in Theorem |51 

Proof: This result is an immediate consequence of [33 , Theorem 1]. For the sake of 
completeness, we rederive it for our particular setting. From Theorem |5] 



E 



K 



logdet Lv + 



1 H n (H n ) H 



K 



E 



1 1 .H n (H n )" 
u K K 



du 



(673) 



|J(£_ w ),„ + (_i_). (674) 



We therefore only need to evaluate the integral term in the RHS. To proceed, following a similar 
approach as in 11381 , it is useful to consider C(x) = C(x, 5(x)) for a function C(x, S) in the two 
variables (x, 5) so to write, with some slight abuse of notation, 



d dC dC 

C'(x) = —C(x,5 (x)) = —(x,5 (x)) + —(x,5 (x))5' (x) 



(675) 
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where 



5 (x) 



l + 5 (» Vl + 5 (a;) 1 + z(l + 5 (x)) t 
c 1 c c(l + 5 (x)) 



h 5 (x) 



(676) 
(677) 



a; 1 + a;(l + 5q(x)) x 1 + x(l + So(x)) x 
both lines being consequences of Property \T\ (Hi). We conclude that C'(a 2 ) = — % + 5o(c 2 ) 
which, along with \im. u 2_ ¥OQ C(a 2 ) = 0, concludes the proof. ■ 
Lemma 11: Let a 2 , c > and 5 m (x), m > 0, be as defined in Proposition [7J in Appendix IC-A1 
Then, 



c — r- 7- z : , — att = log(l + t> (cr )) 



2 

oo 



m(1-c + u(1 + 2<J (w))) 
<S (u) - o 2 b x (u) 



H° 2 



+ clog 1 + 



1 



a 2 1 + 6 (a 



c + u(l + 25 (u)) 



du 



log 1 - - 



1 S (a 2 ) 2 



t2 1 
Proof: 

Proof of part (i): We simply notice here that 

i-c + 24W-4W 2 



c(l + 5 (a 2 W 



u{l - c + u{l + 25 {u))) 



c 
u 
c 



u5q(u) 2 + c 



l-c + u(l + 25 (u)) 
u5q(u) 2 + c 



- - 6 (u) 
u 



(678) 
(679) 
(680) 



where we used Property Q] (Hi) in the second equality. The result then unfolds from Theorem [6] 
Proof of part (ii): We start with the following calculus: 

8 (u) - a 2 <5i(w) 



OO 



2 1 - c + u(l + 28 (u)) 
*>(«) 



du 



+ 



a 2 S (u)(l + 5 (a 2 )) 



1 - c + u(l + 2S (u)) (1 - c + a 2 (l + a 2 ) + u5 (u))(l -c + u(l + 25 (u))) 



5 (u)5' (u) 



+ 



a 



2 6' (u)(l + 6 ( ( r 2 )) 



du 



du 
(681) 

(682) 



<y (iO(l + <y„(iO) l + cr 2 (l + 5 (a 2 )) + 5 (u)a 2 (l + 5 (a 2 )) 
where in the first equality we developed the expression of 5±(u) and in the second equality we 
introduced 5' Q (u) in both numerators and used the relation by iterating the relation x5q(x) 2 = 
c — 5q(x)(1 — c + x) (from Property \T\ (Hi)) in the second denominator in order to maintain a 
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degree one polynomial in 5 (u). Writing 5 (u)5' (u) = [25q(u)5' (u) + S' (u)] — S' (u)(l + 5 (u)) 



UU^IUU fill/ JJ^AJ llVJlllltU 

in the numerator of the 



first 



,6* 1 

/■oo 



term, we then find 

-du 



- c + w(l + 25 (w)) 

5o(u)(l + 5oM) <5oH 



+ 



4(»)(lHo(a 2 )) 
l + a 2 (l + 5 (c7 2 )) + 5oHa 2 (] 



= [- log(l + 5 (u)) + log(l + a 2 (l + 5 (^))(1 + 5oH))]: =ff2 
= log(l + 5 (a 2 )) + log(l + a 2 (l + ^(a 2 ))) - log(l + a 2 (l + S { 

, Al + ^o(^))(l + ^(l + ^o(^))) \ 
g V l + a 2 (l + 5 (a 2 )) 2 



»a 2 (l + 5 (t7 2 )). 



-o(- 2 )) 2 ) 



1 + a 2 (l + 5 (a 2 )) 2 
At this point, remark that 



(l + £ (q 2 ))(l + q 2 (l + £ (q 2 ))) 



and that 



l + a 2 (l 



+ 5 (a 2 )) 2 



= 1 



- <x 2 (l + 5 (a 2 )) 2 



1 + a 2 (l + 5oK)) 2 = 1 + a 2 + a 2 5 (a A ) + c + c5 (a 2 ) 

+ 2c + c5 (a 2 ) 



,a + S (a 2 )) 2 



5n(a 2 ) 



using Property \T\ (Hi) in the second equality. 
This allows us to finally conclude that 

S (u) - o 2 5 1 (u] 



a 2 l-c + it(l + 2<$o(i0) 



du = — lot 



1 

1 - - 



S (a 



2\2 



c(l + 5 (a 2 )) 2 



du 

(683) 
(684) 
(685) 

(686) 

(687) 

(688) 
(689) 

(690) 



(691) 
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